11C-PIB-PET for the early diagnosis of Alzheimer’s disease dementia and other dementias in people with mild cognitive impairment (MCI)

  • Protocol
  • Diagnostic



This is the protocol for a review and there is no abstract. The objectives are as follows:

To determine the diagnostic accuracy of the PIB-PET index test at various thresholds for detecting participants with mild cognitive impairment at baseline who would clinically convert to Alzheimer’s disease dementia or other forms of dementia at follow-up.

To investigate heterogeneity of test accuracy in the included studies.


The pathology of Alzheimer’s disease is present in the majority of cases of dementia.  As the dominant or sole pathology, it accounts for over 50% of dementia, afflicting 5% of men and 6% of women over the age of 60 worldwide (World Health Organization 2002). However, the strength of the link between cognitive impairment and the pathological features of Alzheimer’s disease varies both with age and with each of the different pathological features.  It has also been recognised that a significant number of individuals without clinical evidence of Alzheimer’s disease have amyloid deposition at death (Dickson 1992).  Indeed, epidemiological neuropathological studies have established that there is no significant relationship between amyloid plaque burden and cognitive impairment in those over the age of 90 (Savva 2009).

The term 'Alzheimer’s disease dementia' (ADD) is used to describe those in whom the symptoms of cognitive impairment have progressed gradually to the point where the ability of the patient to perform everyday functions has been affected.  Before this, there is a stage, known as mild cognitive impairment (MCI), in which the patient has a degree of cognitive impairment which is greater than expected for age, but is not impaired in function. Before MCI, there is a stage in which the pathology is present and increasing, but has not yet affected cognitive function, known as 'preclinical Alzheimer’s disease'. 

MCI is a heterogeneous condition. In this protocol MCI refers to the clinical criteria defined by Petersen, or the revised Petersen criteria (Petersen 1999; Petersen 2004; Winblad 2004), or to the Cognitive Dementia Rating (CDR=0.5) scale (Morris 1993) or to the sixteen different classifications of MCI (Matthews 2008). There are four outcomes for those within an MCI population: progression to ADD, progression to another dementia, maintaining stable MCI, or recovery. At present, there is no clinical method to determine who, of those patients

The main concern of patients who present with worries about their cognitive function is whether there is a treatment which will either improve, or delay progression of their symptoms. The rate at which patients cross the boundary between preclinical Alzheimer’s disease and MCI, and between MCI and ADD, depends on several factors. Patients presenting to primary care are different from those in secondary care, who are different again from those in research settings.  Those with ApoE4 genotype progress more rapidly.  Within the 'MCI band', those with worse cognitive function progress more rapidly. Studies indicate that an annual average of 5% to 15% of MCI patients progress to ADD (Petersen 1999; Bruscoli 2004; Mattsson 2009; Petersen 2009).

Alzheimer’s disease pathology is associated with a central amyloidosis, and the presence of beta-amyloid plaques and neurofibrillary tangles in brain tissues at autopsy has been considered a 'gold standard' for the definitive diagnosis of Alzheimer’s disease (Mirra 1991; Winblad 1997; Newell 1999). However, Abeta amyloid plaques are present in conditions other than ADD (Villemagne 2008). Abeta amyloid deposits, measured with a 'Pittsburgh Compound-B' (PIB) radioactive substance, are higher in congophilic angiopathy (Johnson 2007) and dementias other than ADD.  PIB retention and Abeta imaging in vivo could indicate more accurate differential diagnosis of the dementias.  For instance, PIB could differentiate Alzheimer's disease from frontotemporal dementia (FTD) (Rabinovici 2007; Rowe 2007; Drzezga 2008; Engler 2008). The role of the PIB positron emission tomography (PET) biomarker in dementia differential diagnostics will be evaluated in a number of separate Cochrane systematic reviews. 

It is a reasonable assumption, and one on which this review is predicated, that if a patient has both MCI and the pathology of Alzheimer's disease, and then goes on to develop clinical ADD, then the cause of the initial MCI and of the ADD was the Alzheimer’s pathology. 

Our approach is an example of assessing diagnostic accuracy using 'delayed verification of diagnosis'. The distinction between 'prognosis' and 'diagnosis' is, in this circumstance, somewhat semantic. Instead of the reference standard being based on pathology however, it is based on a clinical standard: the progression from MCI to ADD or other dementias.  Although, for the reasons stated above, this introduces a degree of unreliability, it has the advantage of being based on what matters to patients: progression. 

The PIB-PET biomarker results represent Abeta amyloid deposition in the brain. We will be looking at the relation between 11C-PIB ligand binding in the brain and: i) conversion from MCI to ADD; or ii) conversion from MCI to other forms of dementia.

Target condition being diagnosed

In this review there are two target conditions: i) ADD and ii) other forms of dementia, which will be assessed at follow-up.

We will compare the index test results obtained at baseline with the results of the reference standards obtained at follow-up (delayed verification). 

Index test(s)

11C-PIB-PET test for the detection of Abeta amyloid deposition in the brain regions (e.g. the frontal, parietal and temporal cortices, posterior cingulum, etc.) at baseline.

11C-PIB-PET is a molecular imaging biomarker. PIB is a N-methyl-[11C]2-(4/-methylaminophenyl)-6-hydroxybenzothyazole radiotracer derived from thioflavin T (Klunk 2004).  Price 2005 fully evaluated quantitative PIB-PET data in order to identify a valid, simple and reliable PET quantisation method for the routine measure of brain amyloid retention in vivo.

Clinical pathway

Dementia develops over several years. There is a presumed period when people are asymptomatic, and when pathology is accumulating.  Individuals or their relatives may then notice subtle impairments of recent memory. Gradually, more cognitive domains become involved, and difficulty in planning complex tasks becomes increasingly apparent. In the UK, people usually present to their general practitioner, who may administer the index tests and refer the person to a hospital memory clinic.  However many people with dementia do not present until much later in the disorder and will follow a different pathway to diagnosis, for example being identified during an admission to general hospital for a physical illness. Thus the pathway influences the accuracy of the diagnostic test.  The accuracy of the test will vary with the experience of the administrator, and the accuracy of the subsequent diagnosis will vary with the history of referrals to the particular healthcare setting. Diagnostic assessment pathways may vary in other countries and diagnoses may be made by a variety of specialists including neurologists and geriatricians. 

Alternative test(s)

We are not including alternative tests in this review because there are currently no standard practice tests available for the diagnosis of dementia. 

The Cochrane Dementia and Cognitive Improvement Group (CDCIG) is in the process of conducting a series of diagnostic test accuracy reviews of biomarkers and scales (see list below). Although we are conducting reviews on individual tests compared to a reference standard, we plan to compare our results in an overview.

  • 18F-FDG-PET (18F-2-fluoro-2-deoxy-D-glucose)

  • CSF (Cerebrospinal fluid analysis of Abeta and tau)

  • sMRI (structural magnetic resonance imaging)

  • Neuropsychological tests (MMSE; MiniCOG; MoCA)

  • Informant interviews (IQCODE; AD8)

  • APOE e4

  • FP-CIT SPECT (Fluoropropil-Carbomethoxy-lodophenil-Tropane Single-photon emission tomography)


The new diagnostic criteria for Alzheimer’s disease and MCI due to Alzheimer’s disease (Dubois 2010; Albert 2011; McKhann 2011; Sperling 2011) incorporate add-on biomarkers based on imaging or CSF measures. These add-on tests to core clinical criteria might increase the sensitivity or specificity of a testing strategy. It is crucial that each of these biomarkers is assessed for its diagnostic accuracy before being adopted as a routine add-on test in clinical practice.  

The 11C-PIB-PET biomarker, as the add-on test, might facilitate accurate identification of those patients with MCI who would convert to Alzheimer’s disease or other forms of dementia.  At the present time there is no 'cure' for dementia, but there are some treatments which can slow cognitive and functional decline, or reduce the associated behavioural and psychiatric symptoms of dementia (Birks 2006; McShane 2006). In addition, the accurate early diagnosis of dementia may improve opportunities for the use of newly-evolving interventions designed to delay or prevent progression to more debilitating stages of dementia (Oddo 2004). Coupled with appropriate contingency planning, proper recognition of the disease may also help to prevent inappropriate and potentially harmful admissions to hospital or institutional care (NAO 2007).


To determine the diagnostic accuracy of the PIB-PET index test at various thresholds for detecting participants with mild cognitive impairment at baseline who would clinically convert to Alzheimer’s disease dementia or other forms of dementia at follow-up.

Secondary objectives

To investigate heterogeneity of test accuracy in the included studies.


Criteria for considering studies for this review

Types of studies

We will consider longitudinal cohort studies in which index test results are obtained at baseline and the reference standard results at follow-up (see Index tests and Reference standards below). These studies necessarily employ delayed verification of conversion to dementia, and are sometimes labelled as 'delayed verification cross-sectional studies' (Knotnerus 2002; Bossuyt 2008). 

We will include case control studies if they incorporate a delayed verification design. We believe this can only occur in the context of a cohort study, so these studies are invariably diagnostic nested case-control studies.


Participants recruited and clinically classified as those with mild cognitive impairment (MCI) at baseline will be eligible for this review.  The diagnosis for MCI will be established using the Petersen criteria or revised Petersen criteria (Petersen 1999; Petersen 2004; Winblad 2004) and/or Matthews 2008 criteria and/or CDR=0.5 (Morris 1993).  These criteria include: subjective complaints; a decline in memory objectively verified by neuropsychological testing in combination with a history from the patient; a decline in other cognitive domains; no or minimal impairment of activities of daily living; not meeting the criteria for dementia. Therefore, the eligible participants will have had a number of tests, e.g. neuropsychological tests for cognitive deficit and checklists for activities of daily living, prior to study entry. Participants will be defined as amnestic single domain or amnestic multiple domain or non-amnestic single domain or non-amnestic multiple domain or non-specified MCI participants.

Although the 11C-PIB-PET test is carried out only in tertiary care, sources of referrals might differ in this setting. We will examine the potential influence of different referral centres on diagnostic performance of the index test in the analyses.

We will exclude those studies that involved patients with MCI possibly caused by: i) a current or history of alcohol/drug abuse; ii) central nervous system (CNS) trauma (e.g. subdural haematoma), tumour or infection; iii) other neurological conditions e.g. Parkinson’s or Huntington’s diseases.

Detail of the causes of study drop-outs is crucial, and if such data are missing the reliability of the conclusions must be questioned.

Index tests

11C-PIB-PET biomarker test.

There are currently no generally-accepted standards for PIB positivity threshold, and therefore it is not possible to pre-specify a 11C-PIB positivity threshold. 

Criteria for 11C-PIB positivity: we will use the criteria which were applied in each included primary study to classify participants as either 11C- PIB positive or 11C-PIB negative.

Measure of 11C-PIB amyloid retention (retention ratio): Distribution Volume Ratio (DVR), Standardised Uptake Value Ratio (SUVR) or other ratios.

Image analysis: not pre-specified: Statistical Parametric Mapping (SPM), MilxView medical image and analysis software or other image analysis techniques.  

Time between 11C-PIB injection and PET acquisition: not pre-specified (e.g. 40 minutes; 60 minutes; 50 to 70 minutes; 90 minutes).

11C-PIB injection dose: not pre-specified (e.g. 300 MBq; 370 MBq).

11C-PIB retention detecting regions: not pre-specified (e.g. global cortex, caudate nucleus, putamen, thalamus, pons).

Target conditions

There are two target conditions in this review:

  1. Alzheimer’s disease dementia (ADD) (conversion from MCI to ADD)

  2. Any other forms of dementia (conversion from MCI to any other forms of dementia)

Reference standards

This will be progression to the target conditions. For the purpose of this review, several definitions of ADD are acceptable.  Included studies may apply probable or possible NINCDS-ADRDA criteria (McKhann 1984). The Diagnostic and Statistical manual of Mental disorders (DSM) (DSMIII 1987; DSMIV 1994) and International Classification of Diseases (ICD) (World Health Organization 2010) definitions for ADD will also be acceptable. It should be noted that different iterations of these standards may not be directly comparable over time (e.g. DSM-IIIR vs. DSM-IV or ICD9 vs. ICD10). Moreover, the validity of the diagnoses may vary with the degree or manner in which the criteria have been operationalised (e.g. individual clinician vs. algorithm vs. consensus determination).  We will consider all these issues in interpreting the results, using sensitivity analyses as appropriate.

Similarly, differing clinical definitions of other dementias are acceptable.  For Lewy Body Dementia the reference standard is the McKeith criteria (McKeith 1996 or McKeith 2005); for fronto-temporal dementia the Lund criteria (Lund 1994), Neary 1988, Boxer 2005, DSM (DSMIII 1987; DSMIV 1994), ICD (World Health Organization 2010); and for vascular dementia the NINDS ARIEN criteria (Roman 1993), DSM (DSMIII 1987; DSMIV 1994) and ICD (World Health Organization 2010).

The time interval over which progression from MCI to ADD or other forms of dementia happened is also important. The minimum period of delay in the verification of the diagnosis (i.e. the time between the assessment at which a diagnosis of MCI is made and the assessment at which the diagnosis of dementia is made) is one year. Where a mean of duration is specified, we will exclude the study if mean - 1 standard deviation is less than one year, which will ensure that no more than 16% of participants were followed-up for less than one year if the ages are normally distributed. 

If possible, we will segment analyses into separate follow-up mean periods for the delay in verification: 1 year to less than 2 years; 2 to less than 4 years; and greater than 4 years.  In this eventuality we will clearly note where the same included studies contribute to the analysis for more than one reference standard.

Search methods for identification of studies

We will use various information sources to ensure all relevant studies are included.  Search strategies for electronic database searching will be devised by the Trials Search Coordinator of the Cochrane Dementia and Cognitive Improvement Group.

Electronic searches

We will search MEDLINE (OvidSP), EMBASE (OvidSP), Science Citation Index (ISI Web of Knowledge), PsycINFO (Ovid), and LILACS (Bireme). See Appendix 1 for a proposed strategy to be run in MEDLINE (OvidSP). Similarly-structured search strategies will be designed using search terms and syntax appropriate for each database listed above. We will request a search of the Cochrane Register of Diagnostic Test Accuracy Studies (Cochrane Renal Group).

There will be no restriction based on language of study. Translation services will be used as necessary.

A single researcher with extensive experience of systematic reviews will perform initial searches.

Searching other resources

Grey literature: chosen electronic databases will include assessments of conference proceedings.

Handsearching: we will not perform handsearching as there is little published evidence of the benefits of handsearching for reports of diagnostic test accuracy (DTA) studies (Glanville 2010).

Reference lists: we will scan reference lists of all eligible studies and reviews in the field for further possible titles, and repeat the process until no new titles are found (Greenhalgh 2005).

Correspondence: we will contact research groups who have published or are conducting work on screening tests for dementia diagnosis. Groups for contact will be informed by the initial results of our literature search.

Data collection and analysis

Selection of studies

One researcher will screen all titles and abstracts generated by electronic database searches for relevance. 

Two researchers will independently review remaining abstracts of selected titles and select all potentially eligible studies for full paper review.  Two independent researchers will further assess full manuscripts against the inclusion criteria.  Where necessary, a third arbitrator will resolve disagreements that the two researchers cannot resolve through discussion.

Where a study may include useable data but these are not presented in the published manuscript, we will contact the authors directly to request further information.  If the same data set is presented in more than one paper, we will include the primary paper.

We will detail the numbers of studies selected at each point in a PRISMA flow diagram.

Data extraction and management

We will extract the following data on study characteristics:

Bibliographic details of primary paper:

  • Author, title of study, year and journal

Basic clinical and demographic details:

  • Number of subjects

  • Clinical diagnosis

  • MCI clinical criteria

  • Age

  • Gender

  • Referral centre(s)

  • Participant recruitment

  • Sampling procedures

Details of the index test:

  • Method of the PIB test administration, including who administered the test

  • Thresholds used to define positive and negative tests

  • Other technical aspects as seems relevant to the review, e.g. brain areas

Details of the reference standard:

  • Definition of ADD and other dementias used in reference standard

  • Duration of follow-up from time of index test used to define ADD and other dementias in reference standard: 1 year to < 2 years; 2 to < 4 years; and > 4 years. If participants have been followed for varied amounts of time we will record a mean follow-up period for each included study. If possible, we will group those data into minimum, maximum and median follow-up periods, which may then become the subject of subgroup analyses.

  • Prevalence or proportion of population developing ADD and other dementias, with severity, if described.

The results of the 2x2 tables cross-relating index test results of the reference standards:

Table 1: Conversion from MCI to Alzheimer’s disease dementia
Index test informationReferences standard information
ADD presentADD absent
Index test positive PIB+ who convert to ADD (TP)PIB+  who remain MCI (FP) & PIB+ who convert to non-ADD (FP)
Index test negativePIB- who convert to ADD (FN)PIB-  who remain MCI (TN) & PIB- who convert to non-ADD (TN)
Table 2: Conversion from MCI to non-Alzheimer’s disease dementia
Index test informationReferences standard information
Non-ADD presentNon-ADD absent
Index test positivePIB+ who convert to non-ADD (TP)PIB+  who remain MCI (FP) & PIB+ who convert to ADD (FP)
Index test negativePIB-  who convert to non-ADD (FN)PIB-  who remain MCI (TN) & PIB-  who convert to ADD (TN)


Table 3: Conversion from MCI to any forms of dementia
Index test informationReferences standard information
Any forms of dementia presentDementia absent
Index test positivePIB+ who convert to any form of dementia (TP)PIB+  who remain MCI (FP)
Index test negativePIB- who convert to any form of dementia (FN)PIB-  who remain MCI (TN)

The numbers lost to follow-up

We will also extract data necessary for the assessment of quality, as defined below.

In general the data extraction pro-forma will be piloted against two included papers. Two researchers will extract the data independently.  Where necessary, a third arbitrator will resolve disagreements about data extraction that the two researchers cannot resolve through discussion.

Assessment of methodological quality

We will assess methodological quality of each study using the QUADAS-2 tool (Whiting 2011) as recommended by The Cochrane Collaboration.  The tool is made up of four domains:

  • Patient selection;

  • Index test;

  • Reference standard;

  • Patient flow. 

Each domain is assessed in terms of risk of bias, with the first three domains also considered in terms of applicability.  The components of each of these domains and a rubric which details how judgements concerning risk of bias are made are detailed in Appendix 2. Certain key areas important to quality assessment are participant selection, blinding and missing data.

We will pilot a QUADAS-2 assessment on two papers. If agreement is poor, we will refine the signalling questions. QUADAS-2 data will not be used to form a summary quality score. We will produce a narrative summary describing numbers of studies that were found to have high/low/unclear risk of bias as well as concerns regarding applicability.

Statistical analysis and data synthesis

We will apply the DTA framework for the analysis of a single test and extract the data from a study into a 2x2 table, showing the binary test results cross-classified with the binary reference standard and ignoring any censoring that might have occurred.  We acknowledge that such a reduction in the data may represent a significant oversimplification. We will therefore adopt an Intention to diagnose (ITD) approach as well.  If possible, we will present what the result would be if all dropouts would have developed dementia, and if all dropouts would not have developed dementia. We may also need to assume that the proportion of positive and negative test results is the same in the unknown as the known participants in order to do this.

We will use data from the 2x2 tables abstracted from the included studies (TP, FN, FP, TN) and entered into RevMan software (RevMan 2012) to calculate the sensitivities, specificities and their 95% confidence intervals. We will also present individual study results graphically by plotting estimates of sensitivities and specificities in both a forest plot and a receiver operating characteristic (ROC) space. If more than one threshold is published in primary studies we will report accuracy estimates for all thresholds.

If there are sufficient data we will meta-analyse the pairs of sensitivity and specificity. The preferred approach would be the hierarchical summary ROC curve (HSROC) method proposed by Rutter and Gatsonis (Rutter 2001) (Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, chapter 10; Macaskill 2010) because implicit thresholds are expected in primary studies.  We will conduct these analyses in SAS (Statistical Analysis Software) (SAS Institute 2011) with support from the UK DTA Support Unit. Particularly if there are common thresholds across included studies we will also consider the bivariate random-effects approach (Reitsma 2005). When a primary study reports more than one threshold result, we will only select the threshold nearer to the upper left point of the ROC curve for the meta-analysis. We are aware that this data-driven method for threshold selection could lead to an overestimate of diagnostic accuracy (Leeflang 2008). However, there are no accepted thresholds to define positive PIB-PET tests and therefore the accuracy estimates reported in primary studies are likely to be based on data-driven threshold selection.

We will explore the implications of any summary accuracy estimates not affected by heterogeneity emerging by considering the numbers of false positive and false negatives in populations with different prevalence of MCI, and by presenting the results as natural frequencies and using alternative metrics such as likelihood ratios and predictive values.

We will prepare a summary of results table irrespectively.

Investigations of heterogeneity

The framework for the investigation includes the following factors:

  • Spectrum of patients (mean age, gender, Mini-Mental State Examination (MMSE) score, APOE ε4 status). Concerning age, any studies that include 30% of patients below the age of 65 will be examined separately.     

  • Different referral centres: primary care versus memory clinic versus hospital. 

  • Different clinical criteria of MCI: Petersen criteria versus revised Petersen criteria versus CDR=0.5 criteria versus different MCI classification (Matthews 2008).

  • Index test: thresholds, if stated; differences in 11C-PIB retention ratio; differences in image analysis; differences in time between 11C-PIB injection and PET acquisition; differences in 11C-PIB injection dose; differences in 11C-PIB retention detecting regions.

  • Reference standard/s used: e.g. NINCDS-ADRDA vs DSM vs ICD10 for ADD.      

  • Duration of follow-up: 1year to < 2 years versus 2 to < 4 years versus > 4 years.     

  • Aspects of study quality, particularly inadequate blinding and loss to follow-up: we will consider separately those studies that have more than 20% drop-outs.

  • Potential conflict of interest.

To investigate the effects of the sources of heterogeneity, we will perform a descriptive analysis by visual examination of the forest plot of sensitivity and specificity and the ROC plot. If there are sufficient included studies, we will perform subgroup analyses using RevMan software (RevMan 2012). We will examine the influence of potential conflict of interest in a sensitivity analysis. Additionally, we will also perform an analysis with the inclusion of the potential sources of heterogeneity as covariates in the HSROC model in SAS.

Sensitivity analyses

If not already explored as part of the investigation of heterogeneity above, we will perform a sensitivity analysis, for example in order to investigate the influence of study quality on overall diagnostic accuracy of the PIB-PET biomarker. In this analysis we will omit studies at high risk of bias (see Appendix 2).

We will also perform a sensitivity analysis with and without the intention to diagnose (ITD) approach.

In addition, we will evaluate the effects of data-driven threshold selection studies on overall diagnostic accuracy of the PIB-PET biomarker by excluding them.

Assessment of reporting bias

We will not investigate reporting bias because of current uncertainty about how it operates in test accuracy studies and the interpretation of existing analytical tools such as funnel plots. The effect of the presence of potential conflicts of interest may be investigated as part of any assessment of heterogeneity


Appendix 1. MEDLINE search strategy

MEDLINE In-process and other non-indexed citations and MEDLINE, 1950-present (Ovid SP)

1. Positron-Emission Tomography/

2. (PiB or PIB).ti,ab.

3. "Pittsburgh compound B".ti,ab.

4. "C Pittsburgh".ti,ab.

5. (PIB-PET or PET-PIB).ti,ab.

6. ("amyloid deposit*" OR "abeta deposit*" OR "amyloid burden").ti,ab.

7. "[11C]PIB".ti,ab.

8. ("amyloid ligand*" OR "amyloid radioligand*").ti,ab.

9. ((PET and (scan* or imag*)) or "positron emission tomography").ti,ab.

10. or/1-9

11. (alzheimer* or dement* or AD or lewy* or DLB or LBD).ti,ab.

12. exp dementia/ OR Mild Cognitive Impairment/

13. ((cognit* or memory or cerebr* or mental*) adj3 (declin* or impair* or los* or deteriorat* or degenerat* or complain* or disturb* or disorder*)).ti,ab.

14. MCI.ti,ab.

15. ACMI.ti,ab.

16. ARCD.ti,ab.

17. SMC.ti,ab.

18. CIND.ti,ab.

19. BSF.ti,ab.

20. AAMI.ti,ab.

21. LCD.ti,ab.

22. QD.ti,ab.

23. AACD.ti,ab.

24. MNCD.ti,ab.

25. MCD.ti,ab.

26. (nMCI or aMCI or mMCI).ti,ab.

27. ("N-MCI" or "A-MCI" or "M-MCI").ti,ab.

28. ("CDR 0.5" or "clinical dementia rating scale 0.5" or "0.5 CDR").ti,ab.

29. ("GDS 3" or "3 GDS").ti,ab.

30. ("global deterioration scale" and "stage 3").ti,ab.

31. or/11-30

32. 10 and 31

33. (animals not (humans and animals)).sh.

34. 32 not 33

35. (2000* or 2012*).ed.

36. 34 and 35

Appendix 2. Assessment of methodological quality table QUADAS-2 tool

DescriptionDescribe methods of patient selection: describe included patients (prior testing, presentation, intended use of index test and setting)Describe the index test and how it was conducted and interpretedDescribe the reference standard and how it was conducted and interpretedDescribe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2x2 table (refer to flow diagram): describe the time interval and any interventions between index test(s) and reference standard

Signalling questions


Was a consecutive or random sample of patients enrolled?Were the index test results interpreted without knowledge of the results of the reference standard?Is the reference standard likely to correctly classify the target condition?Was there an appropriate interval between index test(s) and reference standard?
Was a case-control design avoided?If a threshold was used, was it pre-specified?Were the reference standard results interpreted without knowledge of the results of the index test?Did all patients receive a reference standard?
Did the study avoid inappropriate exclusions?Did all patients receive the same reference standard?
Were all patients included in the analysis?
Risk of bias: High/low/ unclearCould the selection of patients have introduced bias?Could the conduct or interpretation of the index test have introduced bias?      Could the reference standard, its conduct, or its interpretation have introduced bias?Could the patient flow have introduced bias? 
Concerns regarding applicability: High/low/ unclearAre there concerns that the included patients do not match the review question?Are there concerns that the index test, its conduct, or interpretation differ from the review question?Are there concerns that the target condition as defined by the reference standard does not match the review question? 

Appendix 3. Anchoring statements for quality assessment of PIB-PET diagnostic studies

Table 4: Review question and inclusion criteria

CategoryReview QuestionInclusion Criteria
PatientsParticipants with mild cognitive impairment, no dementiaParticipants fulfilling the criteria for the clinical diagnosis of MCI at baseline
Index Test11C-PIB-PET biomarker11C-PIB-PET biomarker
Target Condition

Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia)

Any other forms of dementia (conversion from MCI to any other forms of dementia

Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia)

Any other forms of dementia (conversion from MCI to any other forms of dementia)

Reference StandardNINCDS-ADRDA; DSM; ICD; McKeith criteria; Lund criteria; NINDS-ARIEN criteriaNINCDS-ADRDA; DSM; ICD; McKeith criteria; Lund criteria; NINDS-ARIEN criteria
OutcomeN/AData to construct 2x2 table
Study DesignN/A
  • Longitudinal cohort studies and

  • Nested case-control studies if they incorporate a delayed verification design (case-control nested in cohort studies)


Anchoring statements for quality assessment of PIB-PET diagnostic studies

We provide some core anchoring statements for quality assessment of diagnostic test accuracy review of the PIB-PET biomarker in dementia.  These statements are designed for use with the QUADAS-2 tool and are based on the guidance for quality assessment of diagnostic test accuracy reviews of IQCODE in dementia (Quinn 2012). 

During the two-day, multidisciplinary focus group and the piloting/validation of the guidance, it was clear that certain issues were key to assessing quality, while other issues were important to record but less important for assessing overall quality.  To assist, we describe a 'weighting' system.  Where an item is weighted 'high risk' then that section of the QUADAS-2 results table is likely to be scored as high risk of bias.  For example in dementia diagnostic test accuracy studies, ensuring that clinicians performing dementia assessment are blinded to results of the index test is fundamental.  If this blinding was not present then the item on the reference standard should be scored 'high risk of bias', regardless of the other contributory elements.

In assessing individual items, the score of unclear should only be given if there is genuine uncertainty.  In these situations review authors will contact the relevant study teams for additional information.

Anchoring statements to assist with assessment for risk of bias

Patient selection

Was the sampling method appropriate?

Where sampling is used, the designs least likely to cause bias are consecutive sampling or random sampling.  Sampling that is based on volunteers or selecting subjects from a clinic or research resource is prone to bias.

Weighting: High risk of bias (‘no’)

Was a case-control or similar design avoided?

Designs similar to case control that may introduce bias are those designs where the study team deliberately increase or decrease the proportion of subjects with the target condition, which may not be representative.  For example a population study may be enriched with extra dementia subjects from a secondary care setting, which are typically more diseased. Some case control methods may already be excluded if they mix subjects from various settings.

Weighting: High risk of bias (‘no’)

Are exclusion criteria described and appropriate?

The study will be automatically graded as unclear if exclusions are not detailed (pending contact with study authors).  Where exclusions are detailed, the study will be graded as 'low risk' if exclusions are felt to be appropriate by the review authors.  Certain exclusions common to many studies of dementia are: medical instability; terminal disease; alcohol/substance misuse; concomitant psychiatric diagnosis; other neurodegenerative condition. Exclusions are not felt to be appropriate if ‘difficult to diagnose’ patients are excluded.

Post hoc and inappropriate exclusions will be labelled 'high risk' of bias.

Weighting: High risk (‘no’)

Index Test

Was PIB-PET assessment/interpretation performed without knowledge of clinical dementia diagnosis?

Terms such as 'blinded' or 'independently and without knowledge of' are sufficient and full details of the blinding procedure are not required.  Interpretation of the results of the index test may be influenced by knowledge of the results of reference standard. If the index test is always interpreted prior to the reference standard then the person interpreting the index test cannot be aware of the results of the reference standard and so this item could be rated as ‘yes’.

For certain index tests the result is objective and knowledge of reference standard should not influence result, for example level of protein in cerebrospinal fluid, in this instance the quality assessment may be 'low risk' even if blinding was not achieved.

Weighting: High risk (‘no’)

Were PIB-PET thresholds pre-specified?

For scales and biomarkers there is often a reference point (in units or categories) above which subjects are classified as 'test positive'; this may be referred to as threshold; clinical cut-off; or dichotomisation point.  A study is classified 'high risk of bias' if the authors define the optimal cut-off post-hoc based on their own study data because selecting the threshold to maximise sensitivity and/or specificity may lead to over-optimistic measures of test performance.

Certain papers may use an alternative methodology for analysis that does not use thresholds and these papers should be classified as not applicable.

Weighting: High risk (‘no’)

Reference Standard

Is the assessment used for clinical diagnosis of dementia acceptable?

Commonly used international criteria to assist with clinical diagnosis of dementia include those detailed in DSM-IV and ICD-10.  Criteria specific to dementia subtypes include but are not limited to NINCDS-ADRDA criteria for Alzheimer’s dementia; McKeith criteria for Lewy Body dementia; Lund criteria for frontotemporal dementias; and the NINDS-AIREN criteria for vascular dementia.  Where the criteria used for assessment is not familiar to the review authors or the Cochrane Dementia and Cognitive Improvement group (‘unclear’) this item should be classified as 'high risk of bias'.

Weighting: High risk (‘no’)

Was clinical assessment for dementia performed without knowledge of the PIB-PET results?

Terms such as 'blinded' or 'independently and without knowledge of' are sufficient and full details of the blinding procedure are not required.  Interpretation of the results of the reference standard may be influenced by knowledge of the results of index test.

Weighting: High risk (‘no’)

Patient flow

Was there an appropriate interval between PIB-PET and clinical dementia assessment?

As we test the accuracy of the PIB-PET test for MCI conversion to dementia, there will always be a delay between the index test and the reference standard assessments. The time between reference standard and index test will influence the accuracy ( Geslani 2005 ; Visser 2006; Okello 2009 ), and therefore we will note time as a separate variable (both within and between studies) and will test its influence on the diagnostic accuracy. We have set a minimum mean time to follow-up assessment of one year. If more than 16% of subjects have assessment for MCI conversion before nine months this item will score ‘no’.

Weighting: High risk (‘no’)

Did all subjects get the same assessment for dementia regardless of PIB-PET result?

There may be scenarios where subjects who score 'test positive' on index test have a more detailed assessment.  Where dementia assessment differs between subjects this should be classified as 'high risk of bias'.

Weighting: High risk (no)

Were all patients who received PIB-PET assessment included in the final analysis?

If the number of patients enrolled differs from the number of patients included in the 2x2 table then there is the potential for bias. If patients lost to follow-up differ systematically from those who remain, then estimates of test performance may differ.

If drop-outs these should be accounted for; a maximum proportion of drop-outs to remain low risk of bias has been specified as 20%. Detail of the causes of study drop-outs is crucial and if such data is missing the reliability of the conclusions must be questioned.

Weighting: High risk (‘no’)

Were missing PIB-PET results or uninterpretable PIB-PET results reported?

Where missing or uninterpretable results are reported, and if there is substantial attrition (we have set an arbitrary value of 50% missing data), this should be scored as ‘no’.  If those results are not reported, this should be scored as ‘unclear’ and authors will be contacted.

Weighting: High risk (‘no’ and ‘unclear’)

Anchoring statements to assist with assessment for applicability

Patient selection

Were included patients representative of the general population of interest?

The included patients should match the intended population as described in the review question.  The review authors should consider population in terms of symptoms; pre-testing; potential disease prevalence; setting.

We recognise that identifying all MCI patients in a given population may be particularly hard to achieve; therefore the information about the judgements for this criterion is particularly likely to be sub-optimal. We expect that all included studies will be sub-optimally reported to some degree. If there is a clear ground for suspecting an unrepresentative spectrum the item should be rated 'poor applicability'.

Index test

Are there concerns that the index test differs from the review question?

In this review we are looking at the level of PIB retention in the brain at baseline (a number of participants with PIB positive and PIB negative test at baseline according to threshold).  Certain studies are looking at the quantitative change in PIB retention from baseline to follow-up. If the accuracy of the index test was based on the level of change in PIB retention the item should be rated 'poor applicability'.

Were sufficient data on PIB-PET application given for the test to be repeated in an independent study?

Variation in technology, test execution, and test interpretation may affect the estimate of accuracy. Particular points of interest include Abeta amyloid deposition in the different brain regions (e.g. the frontal, parietal and temporal cortices, posterior cingulum, etc). In addition, the background and training/expertise of the assessor should be reported and taken into consideration. If PIB-PET was not performed consistently this item should be rated 'poor applicability'.

Reference Standard

Was clinical diagnosis of dementia made in a manner similar to current clinical practice?

For many reviews, inclusion criteria and assessment of risk of bias will already have assessed the dementia diagnosis.  For certain reviews an applicability statement relating to reference standard may not be applicable.  There is the possibility that a form of dementia assessment, although valid, may diagnose a far larger proportion of subjects with disease than usual clinical practice.  In this instance the item should be rated 'poor applicability'.

Declarations of interest

None known