CSF tau and the CSF tau/ABeta ratio for the diagnosis of Alzheimer's disease dementia and other dementias in people with mild cognitive impairment (MCI)

  • Protocol
  • Diagnostic



This is the protocol for a review and there is no abstract. The objectives are as follows:

To determine the diagnostic accuracy of 1) CSF tau, 2) CSF p-tau, 3) the CSF tau/ABeta ratio and 4) the CSF p-tau/ABeta ratio index tests at various thresholds for detecting participants with mild cognitive impairment (MCI) at baseline who would clinically convert to Alzheimer’s disease or other forms of dementia at follow-up.

To investigate the amount of and associations of heterogeneity in the included studies of test accuracy.

We expect that heterogeneity will be likely and that it will be an important component of the review. The potential sources of heterogeneity, which will be used as a framework for the investigation of heterogeneity, include target population, index test, target disorder and study quality.


Dementia is a progressive syndrome of global cognitive impairment with resultant functional decline.  In the United Kingdom (UK), it affects 5% of the population over 65 and 25% of those over 85 (Knapp 2007).  Worldwide, there were estimated to be 36 million people living with dementia in 2010 (Wilmo 2010), and this will increase to over 115 million by 2050.  The greatest increases in prevalence will be seen in the developing regions.  By 2040, China and its western-Pacific neighbours are predicted to have 26 million people living with dementia (Ferri 2005). 

Dementia encompasses a group of neurodegenerative disorders that are characterised by progressive loss of cognitive function and ability to perform activities of daily living, that can be accompanied by neuropsychiatric symptoms and challenging behaviours of varying type and severity.  The underlying pathology is usually degenerative and subtypes of dementia include Alzheimer’s disease dementia, vascular dementia, dementia with Lewy bodies, and frontotemporal dementia.  There may be considerable overlap in the clinical and pathological presentations (MRC CFAS 2001), and there is often co-existence of Alzheimer’s disease dementia, vascular dementia and other causes of neuronal atrophy (Matthews 2009; Savva 2009). 

Alzheimer’s disease dementia is an incurable, progressive, neurodegenerative condition which accounts for over 50% of dementias, afflicting 5% of men and 6% of women over the age of 60 worldwide (World Health Organization 2010).  Its prevalence increases exponentially with age, with Alzheimer’s dementia affecting less than 1% of people aged from 60 to 64 years, but 24% to 33% of those over the age of 85 (Ferri 2005).

There have been over a dozen different definitions used to describe cognitive impairment that is somehow qualitatively different from so-called ‘normal’ ageing.  The first complaints in patients with Alzheimer’s disease spectrum are often cognitive problems such as problems with planning and judgement as well as the more characteristic memory complaints.  This may lead to a diagnosis of Mild Cognitive Impairment (MCI) if formal testing reveals objective evidence of cognitive impairment. It has not been previously mandated which psychometric tests should be used to define objectively cognitive impairment. However, the objectivity of the cognitive impairment is critical as it differentiates this population from a group with subjective cognitive impairment which is more likely to have a non-neurodegenerative aetiology. MCI is a heterogeneous condition, the diagnosis of which holds very little prognostic significance. There are four outcomes for those within an MCI population: progression to Alzheimer’s disease dementia, progression to another dementia, maintaining stable MCI and recovery.  Currently, sixteen different classifications are used to define MCI (Matthews 2008).  In this protocol MCI refers to this extended definition of MCI or to the clinical criteria defined by Petersen criteria or revised Petersen criteria (Petersen 1999; Petersen 2004; Winbald 2004) or to the Cognitive Dementia Rating (CDR = 0.5) scale (Morris 1993). 

Studies (Petersen 1999; Bruscoli 2004; Mattson 2009; Petersen 2009) indicate that an annual average of 5% to 15% of MCI patients progress to Alzheimer’s disease dementia. This all depends on clinical profile, settings and investigation for vascular disease.  At the present time there is no clinical method to determine accurately which of those patients with MCI will develop Alzheimer’s disease dementia or other dementia subtypes. 

Research suggests that measurable change in proton emission tomography (PET), magnetic resonance (MRI) and cerebrospinal fluid (CSF) biomarkers occurs years in advance of the onset of clinical symptoms (Beckett 2010).  In this protocol we aim to assess the ability of:

  1. CSF tau,

  2. CSF phosphorylated tau (p-tau),

  3. The CSF tau/ABeta ratio, and

  4. The CSF p-tau/ABeta ratio,

to enable the detection of Alzheimer’s dementia and other dementia subtypes in patients with MCI. These biomarkers have been chosen as they are considered to be the most intimately expressed biomarkers of the Alzheimer's disease core pathology; namely the aggregation and fibrilisation of the amyloid plaque and hyperphosphorylation of tau. Consequentially, these biomarkers have been proposed as important in new criteria for Alzheimer's dementia that incorporate biomarker abnormalities.

Target condition being diagnosed

In this review there are two target conditions: i) Alzheimer's disease dementia and ii) other forms of dementia, which will be assessed at follow-up.

We will be comparing the index test results obtained at baseline with the results of the reference standard obtained at follow-up (delayed verification of diagnosis). 

Index test(s)

This review is part of a suite of reviews for assessing the accuracy of PET, MRI and other index tests (please see the Alternative test(s) section) in identifying those patients without clinical onset of dementia, who would develop Alzheimer's dementia or other forms of dementia during follow-up. We would consider the following.

Tau and phosphorylated tau (p-tau) CSF biomarker tests

Tau is a microtubule-associated protein located primarily in neuronal axons. There are six different human isoforms, each of which has multiple phosphorylation sites. Physiologically tau interacts with tubulin and plays an important role in the organisation and stabilisation of microtubules. Independent of phosphorylation status, slightly increased levels of CSF total tau (t-tau) have been associated with ageing, vascular dementia, multiple sclerosis, AIDS dementia, head injury and tauopathy; significant increases with Creutzfeldt-Jakob disease and meningoencephalitis; and a threefold increase has been seen in Alzheimer’s disease compared to normal controls (Shoji 2002). Systematic review of CSF biomarkers for Alzheimer’s disease in 2001, analysing 41 studies of CSF t-tau, demonstrated a specificity of 90% and sensitivity of 81% in diagnosing the condition (Blennow 2003).

The p-tau protein also has a number of potential phosphorylation sites (Billingsley 1997) and abnormal hyperphosphorylation has been shown to be associated with microtubule disruption and the formation of neurofibrillary tangles, dystrophic neurites surrounded by neuritic plaques, and neuropil threads, major components of Alzheimer’s disease pathophysiology (see Mandelkow 1998). Systematic review in 2001 of 11 studies of CSF p-tau in Alzheimer’s disease indicated a diagnostic specificity and sensitivity of 92% and 80% respectively (Blennow 2003).

There is great interest around the use of biomarkers and imaging techniques for the prediction of progression from MCI populations to Alzheimer’s disease dementia and other dementia subtypes. The international consortium study Alzheimer Disease Neuroimaging Initiative (ADNI), performed between 2004 and 2009, has so far been a key cohort study for predicting the progression from MCI to Alzheimer’s disease using biomarkers, and demonstrated a sensitivity and specificity of CSF t-tau of 70% and 92% and CSF pTau181 of 68% and 73% respectively (Petersen 2010).

Tau/ABeta ratio and p-tau/ABeta ratio CSF biomarker tests

ABeta is produced mainly by neurons, secreted into the CSF and then cleared through the blood-brain barrier and degraded by the reticuloendothelial system. ABeta levels are thus regulated in strict equilibrium between the brain, CSF and blood (Shoji 1992), but in Alzheimer’s disease patients ABeta42 forms insoluble amyloid and accumulates as intra-cerebral fibrils, resulting in decreased levels of CSF ABeta42 (Shoji 2001).

ABeta in CSF has only modest potential as a test for delayed verification of Alzheimer’s disease (Ritchie 2013), with meta-analysis of studies being hampered by poor methodological quality (Noel-Storr 2013) and multiple thresholds being reported between studies (Ritchie 2011).

In 2001, the American Academy of Neurology produced practical guidelines for dementia, including three Class II or III reports in a systematic review of a combination study of ABeta42 and t-tau CSF levels. The sensitivity and specificity for diagnosis of Alzheimer’s disease were 85% and 87% (Knopman 2001), supported by the 2001 systematic review revealing 83% to 100% sensitivity and 85% to 95% specificity for the CSF ABeta42 and total tau combination assay (Blennow 2003).

Again, the ADNI cohort study demonstrated that the total tau/ABeta42 ratio could be used to predict conversion from MCI to Alzheimer’s disease dementia, revealing a sensitivity of 86% and specificity of 85% (Petersen 2010).

Clinical pathway

Dementia develops over a trajectory of several years.  There is a presumed period when people are asymptomatic, and when pathology is accumulating.  Individuals or their relatives may then notice subtle impairments of recent memory.  Gradually, more cognitive domains become involved, and difficulty planning complex tasks becomes increasingly apparent.  In the UK, people usually present to their general practitioner, who may administer the index tests, and may refer the person to a hospital memory clinic.  However many people with dementia do not present until much later in the disorder and will follow a different pathway to diagnosis, for example being identified during an admission to general hospital for a physical illness.  Thus the pathway influences the accuracy of the diagnostic test.  The accuracy of the test will vary with the experience of the administrator, and the accuracy of the subsequent diagnosis will vary with the history of referrals to the particular healthcare setting. Diagnostic assessment pathways may vary in other countries and diagnoses may be made by a variety of specialists including neurologists and geriatricians. 

Role of index test(s)

The sampling of CSF and assay for levels of tau and A-Beta could have a role when applied in specialist clinics. Due to the costs, risks and complexity of the testing, they will be not be applied in a primary care setting. The role of this index test is as an add-on biomarker test which has been proposed in new research diagnostic criteria to compliment clinical examination and cognitive tests. It is these clinical and cognitive tests which have defined the MCI population under study in this review.

Alternative test(s)

We are not including alternative tests in this review because there are currently no standard practice tests available for the diagnosis of dementia. 

The Cochrane Dementia and Cognitive Improvement Group (CDCIG) is in the process of conducting a series of diagnostic test accuracy reviews of the biomarkers and scales listed below. Although we are conducting reviews on individual tests compared to a reference standard, we plan to compare our results in an overview.

  • FDG-PET  (Positron emission tomography F-fluorodeoxyglucose)

  • 11 C-PIB-PET(Positron emission tomography Pittsburg Compound-B)

  • sMRI (structural magnetic resonance imaging)

  • Neuropsychological tests (MMSE; MiniCOG; MoCA)

  • Informant interviews (IQCODE; AD8)

  • APOE e4

  • FP-CIT SPECT (Fluoropropil-Carbomethoxy-lodophenil-Tropane Single-photon emission tomography)


The new diagnostic criteria for Alzheimer’s disease and MCI due to Alzheimer’s disease (Dubois 2010; Albert 2011; McKhann 2011; Sperling 2011) incorporate add-on biomarkers based on imaging or CSF measures. These add-on tests to core clinical criteria might increase the sensitivity or specificity of a testing strategy. It is crucial that each of these biomarkers is assessed for its diagnostic accuracy before it is adopted as a routine add-on test in clinical practice. It is worth noting that in all these criteria, a single abnormality in any of the proposed tests is considered diagnostic of prodromal Alzheimer’s dementia.

CSF biomarkers might improve diagnoses and thereby treatments and patient outcomes.  At the present time there is no 'cure' for dementia, but there are some treatments which can slow cognitive and functional decline, or reduce the associated behavioural and psychiatric symptoms of dementia (Birks 2006; McShane 2006).  Furthermore, if Alzheimer’s disease can be diagnosed at an earlier, pre-dementia stage, this could present interventions with a critical window for enhanced likelihood of effect as well as help people with dementia, their families and potential carers make timely plans for the future.  Coupled with appropriate contingency planning, proper recognition of the disease may also help to prevent inappropriate and potentially harmful admissions to hospital or institutional care (Bourne 2007).  In addition, the accurate early identification of dementia may improve opportunities for the use of newly evolving interventions designed to delay or prevent progression to more debilitating stages of dementia. 


To determine the diagnostic accuracy of 1) CSF tau, 2) CSF p-tau, 3) the CSF tau/ABeta ratio and 4) the CSF p-tau/ABeta ratio index tests at various thresholds for detecting participants with mild cognitive impairment (MCI) at baseline who would clinically convert to Alzheimer’s disease or other forms of dementia at follow-up.

Secondary objectives

To investigate the amount of and associations of heterogeneity in the included studies of test accuracy.

We expect that heterogeneity will be likely and that it will be an important component of the review. The potential sources of heterogeneity, which will be used as a framework for the investigation of heterogeneity, include target population, index test, target disorder and study quality.


Criteria for considering studies for this review

Types of studies

We will consider longitudinal cohort studies in which index test results are obtained at baseline and the reference standard results at follow-up (see Index tests; Reference standards). These studies necessarily employ delayed verification of conversion to dementia and are sometimes labelled as ‘delayed verification cross-sectional studies’ (Bossuyt 2008; Knottnerus 2002). 

We will include case control studies if they incorporate a delayed verification design.  We believe this can only occur in the context of a cohort study, so these studies are invariably diagnostic nested case-control studies.


Participants recruited and clinically classified as those with mild cognitive impairment (MCI) at baseline will be eligible for inclusion in this review.  The diagnosis for MCI will be established using the Petersen criteria or revised Petersen criteria (Petersen 1999; Petersen 2004; Winbald 2004) and/or Matthews 2008) criteria and/or CDR = 0.5 (Morris 1993).  These criteria include: subjective complaints; a decline in memory objectively verified by neuropsychological testing in combination with a history from the patient; a decline in other cognitive domains; no or minimal impairment of activities of daily living; not meeting the criteria for dementia. Therefore, the eligible participants will have had a number of tests, e.g. neuropsychological tests for cognitive deficit and checklists for activities of daily living, before study entry. Participants will be defined as amnestic single domain or amnestic multiple domain or non-amnestic single domain or non-amnestic multiple domain or non-specified MCI participants.

We will include participants from secondary and tertiary settings. Although demographic and clinical characteristics of MCI as well as sources of recruitment might differ in those settings, we have decided not to limit our review by setting; instead, we will look for variation within and between settings, and will examine the potential influence of the setting on diagnostic performance of the index test in the analyses.

We will exclude those studies that include patients with MCI possibly caused by: i) a current or history of alcohol/drug abuse; ii) central nervous system (CNS) trauma (e.g. subdural haematoma), tumour or infection; iii) other neurological conditions, e.g. Parkinson’s or Huntington’s diseases.

Detail of the causes of study drop-outs is crucial and if such data are missing the reliability of the conclusions must be questioned.

Index tests

  1. CSF tau

  2. CSF p-tau

  3. CSF tau/ABeta ratio

  4. CSF p-tau/ABeta ratio

There are currently no generally accepted standards for CSF tau and CSF p-tau positivity threshold, and therefore it is not possible to pre-specify test positivity threshold.

Criteria for CSF tau and CSF p-tau positivity: we will use the criteria which were applied in each included primary study to classify participants as either test positive or test negative.

Measure of index test: tau and p-tau level in CSF.

We will not include a comparator test because there are currently no standard practice tests available for the diagnosis of dementia. We will compare the index tests with a reference standard.

Target conditions

There are two target conditions in this review:

  1. Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia)

  2. Any other forms of dementia (conversion from MCI to any other forms of dementia)

Reference standards

For the purpose of this review, several definitions of Alzheimer’s disease dementia are acceptable.  Included studies may apply probable or possible NINCDS-ADRDA criteria (McKhann 1984). The Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) definitions for Alzheimer’s disease dementia will also be acceptable. It should be noted that different iterations of these standards may not be directly comparable over time (e.g. DSM-IIIR versus DSM-IV or ICD9 versus ICD10). Moreover, the validity of the diagnoses may vary with the degree or manner in which the criteria have been operationalised (e.g. individual clinician versus algorithm versus consensus determination).  We will consider all these issues in interpreting the results, using sensitivity analyses as appropriate.

Similarly, differing clinical definitions of other dementias are acceptable.  For Lewy body dementia the reference standard is the McKeith criteria (McKeith 1996; McKeith 2005). For frontotemporal dementia the reference standard is the Lund criteria (LMG 1994), Neary 1998, Boxer 2005, DSM and ICD. For vascular dementia the reference standard is the NINDS ARIEN criteria (Roman 1993), DSM and ICD.

The time interval over which progression from MCI to Alzheimer’s disease dementia or other forms of dementia happened is also important. The minimum period of delay in the verification of the diagnosis (i.e. the time between the assessment at which a diagnosis of MCI is made and the assessment at which the diagnosis of dementia is made) is one year. Where a mean duration is specified, we will exclude the study if the mean minus one standard deviation is less than one year, which will ensure that no more than 16% of participants were followed up for less than one year if the follow-up period is normally distributed. If our assumptions regarding distribution are not met then we can develop new methods for standardising the follow-up period using, for example, quartiles.

If possible, we will segment analyses into separate follow-up mean periods for the delay in verification: one year to less than two years; two to less than four years; and more than four years.  In this eventuality we will clearly note where the same included studies contribute to the analysis for more than one reference standard.

Search methods for identification of studies

We will search a variety of information sources to ensure all relevant studies are included. The Trials Search Coordinator of the Cochrane Dementia and Cognitive Improvement Review Group will devise search strategies for electronic database searching.

Electronic searches

We will search:

  • MEDLINE (OvidSP),

  • EMBASE (OvidSP),

  • Science Citation Index (ISI Web of Knowledge),

  • PsycINFO (OvidSP), and

  • LILACS (Bireme).

See Appendix 1 for a proposed draft strategy to be run in MEDLINE (OvidSP). We will design similarly structured search strategies using search terms and syntax appropriate for each database listed above. We will request a search of the Cochrane Register of Diagnostic Test Accuracy Studies (Cochrane Renal Group).

We will also search:

for relevant systematic review and meta-analyses.

We will make no restriction based on language of study. We will use translation services as necessary. We will not use search filters (collections of terms aimed at reducing the number need to screen) as an overall limiter because those published have not proved sensitive enough (Whiting 2011a).

Initial searches will be performed by a single researcher with extensive experience of conducting systematic reviews.

Searching other resources

Grey literature: chosen electronic databases will include assessments of conference proceedings.

Handsearching: we will not perform handsearching. At present there is little published evidence of the benefits of handsearching for reports of DTA studies (Glanville 2010).

Reference lists :we will scan reference lists of all eligible studies and reviews in the field for further possible titles and the process repeated until no new titles are found (Greenhalgh 2005).

Correspondence: we will contact research groups who have published or are conducting work on diagnostic tests for dementia. Groups to contact will be informed by the initial results of our literature search.

Data collection and analysis

Selection of studies

Two researchers (EL and AN-S) will screen all titles and abstracts generated by the electronic database searches for relevance. 

Two researchers (EL and AN-S) will independently review the remaining abstracts of selected titles and select all potentially-eligible studies for full text review.  Two researchers (NS and EL) will then independently further assess full manuscripts against the inclusion criteria (see Criteria for considering studies for this review).  Where necessary, a third arbitrator (CWR) will resolve disagreements that the two researchers cannot resolve through discussion.

Where a study may include useable data but these are not presented in the published manuscript, we will contact the authors directly to request further information.  If the same data set is presented in more than one paper we will include only the primary paper.

We will detail the numbers of studies selected at each point in a PRISMA flow diagram.

Data extraction and management

We will extract the data on study characteristics into the Excel-based template developed by the Diagnostic Test Accuracy Unit in Birmingham released in July 2012 tailored for the needs of these review data. The template includes:

Bibliographic details of primary paper:

  • Author, title of study, year and journal

Basic clinical and demographic details:

  • Number of subjects

  • Mild cognitive impairment (MCI) clinical criteria

  • Age

  • Gender

  • Setting

  • Participant recruitment

  • Sampling procedures

Details of the index test:

  • Method of the CSF tau and CSF tau/ABeta ratio test administration, including who administered the test

  • Thresholds used to define positive and negative tests results

  • Collection handling

Details of the reference standard:

  • Definition of Alzheimer's disease dementia and other dementias used by the reference standard

  • Duration of follow-up from the time of index test application to diagnosis of Alzheimer’s disease dementia or other dementias by the reference standard: 1 year to < 2 years; 2 to < 4 years; and > 4 years. If participants have been followed for a varied amounts of time we will record a mean follow-up period for each included studies. If possible, we will group those data into minimum, maximum and median follow-up periods; these may then become the subject of subgroup analyses.

  • Prevalence or proportion of the population developing Alzheimer's disease dementia and other dementias, with severity, if described.

The results of the 2x2 tables cross-tabulating index test results with the results of the reference standard/s

Table 1: Conversion from MCI to Alzheimer’s disease dementia
Index test informationReference standard information
ADD present ADD absent
Index test positiveIndex test + who convert to ADD (TP)Index test +  who remain MCI (FP) & Index test + who convert to non-ADD (FP)
Index test negativeIndex test - who convert to ADD (FN)Index test - who remain MCI (TN) & Index test - who convert to non-ADD (TN)


Table 2: Conversion from MCI to non-Alzheimer’s disease dementia
Index test informationReference standard information
Non-ADD present Non-ADD absent
Index test positiveIndex test + who convert to non-ADD (TP)Index test + who remain MCI (FP) & Index test + who convert to ADD (FP)
Index test negativeIndex test - who convert to non-ADD (FN)Index test - who remain MCI (TN) & Index test - who convert to ADD (TN)


Table 3: Conversion from MCI to any form of dementia
Index test informationReference standard information
Any forms of dementia present Dementia absent
Index test positiveIndex test + who convert to any form of dementia (TP)Index test + who remain MCI (FP)
Index test negativeIndex test - who convert to any form of dementia (FN)Index test - who remain MCI (TN)

The numbers lost to follow-up

Assessment of methodological quality

We will assess methodological quality of each study using the QUADAS-2 tool (Whiting 2011b) as recommended by the Cochrane Collaboration.  The tool is made up of four domains: Patient selection; Index test; Reference standard; Patient flow.  Each domain is assessed in terms of risk of bias, with the first three domains also considered in terms of applicability (Appendix 2).  The components of each of these domains and a rubric which details how judgments concerning risk of bias are made are detailed in Appendix 3.  Certain key areas important to quality assessment are participant selection, blinding and missing data.

We will pilot a QUADAS-2 assessment on two papers. If agreement is poor, we will refine the signalling questions. We will not use QUADAS-2 data to form a summary quality score. We will produce a narrative summary describing numbers of studies that we considered contained high/low/unclear risk of bias as well as concerns regarding applicability.

Statistical analysis and data synthesis

We will apply the DTA framework for the analysis of a single test and extract the data from a study into a 2x2 table, showing the binary test results cross-classified with the binary reference standard and ignoring any censoring that might have occurred.  We acknowledge that such a reduction in the data may represent a significant oversimplification. We will therefore adopt an intention-to-diagnose (ITD) approach as well. If possible, we will present what the result would be if all dropouts would have developed dementia, and if all dropouts would not have developed dementia.  We may also need to assume that the proportion of positive and negative test results is the same in the unknown as the known participants in order to do this. We will examine the effects of imputation and data from censored participants in a sensitivity analysis, and censoring in the discussion of results.

We will use data from the 2x2 tables abstracted from the included studies (TP, FN, FP, TN) and entered into RevMan to calculate the sensitivities, specificities and their 95% confidence intervals. We will also present individual study results graphically by plotting estimates of sensitivities and specificities in both a forest plot and a receiver operating characteristic (ROC) space. If more than one threshold is published in primary studies we will report accuracy estimates for all thresholds.

If there are sufficient and adequate data we will meta-analyse the pairs of sensitivity and specificity. We will use the hierarchical summary ROC curve (HSROC) method proposed by Rutter and Gatsonis (Rutter 2001). We will conduct these analyses in SAS software. Particularly if there are common thresholds across included studies we might also consider the bivariate random effects approach (Macaskill 2010).

If studies report multiple thresholds we will include the most frequently used cut-off, across all included studies, in meta-analysis. We recognise the limitation of this data-driven approach (Leeflang 2006) but there are no standard thresholds used in practice. We will acknowledge and consider this further in the ‘Discussion’ section of our review.

We will explore the implications of any credible summary accuracy estimates emerging by considering the numbers of false positives and false negatives in populations with different prevalence of dementia subtypes, and by presenting the results as natural frequencies and using alternative metrics such as likelihood ratios and predictive values.

Investigations of heterogeneity

The following factors could be relevant in clinical practice as they relate to the interpretation of the test result. Knowledge of potential sources of heterogeneity that can be referenced within the clinical setting are crucial to understand. This includes patient factors such as age, illness severity and genetic risk as well as how the clinical population where the test applied has been defined where slightly different clinical criteria exist and may be used. This may also include differing assay methods for the CSF tau and A-Beta. All these factors may have an influence on the accuracy of the test itself as it is applied in practice.

The framework for the investigation of possible sources of heterogeneity includes the following factors:

1. Index test

Target population

  • Spectrum of patients (mean age, gender, Mini-Mental State Examination, ApoE status). Concerning age, any studies that include 30% of participants below the age of 65 will be examined separately.

  • Clinical criteria of MCI at baseline: e.g. Petersen criteria versus CDR = 0.5 versus different MCI classification (Matthews 2008)

  • Clinical settings: e.g. secondary care versus tertiary care

Index test

  • Thresholds

  • Technical features: e.g. ELISA vs Innogenetics kit

  • Operator characteristics: e.g. training of assessors

Target disorder

  • Reference standard/s used: e.g. NINDS-ARDRA versus DSM versus ICD10 for Alzheimer's disease dementia

  • Operationalisation of criteria used for the definition of a dementia syndrome: e.g. individual clinician/algorithm/consensus group.

Study quality

  • Types of studies: longitudinal cohort studies or diagnostic nested case-control studies

  • Blinding: prior clinical information will increase accuracy of the index test.

  • Duration of follow-up: 1 year to < 2 years versus 2 years to < 4 years versus > 4 years

  • Loss to follow-up: we will consider separately those studies that have more than 20% attrition

To investigate the effects of the sources of heterogeneity, we will perform a descriptive analysis by visual examination of the forest plot of sensitivity and specificity and the ROC plot. If there are sufficient included studies, subgroup analyses will be performed in the RevMan software.

Sensitivity analyses

If not already explored as part of the investigation of heterogeneity above, we will perform sensitivity analyses for the above covariates, if appropriate. For example in order to investigate the influence of study quality on overall diagnostic accuracy of the CSF biomarkers, we will omit studies at high risk of bias (see Appendix 2).

We will also perform a sensitivity analysis with and without the intention-to-diagnose (ITD) approach. Where data are available we will undertake an analysis of reported intermediate points in the development of dementia.

In addition, we will evaluate the effects of data-driven threshold selection studies on overall diagnostic accuracy of CSF tau and CSF tau/ABeta ratio tests.

Assessment of reporting bias

We will not investigate reporting bias because of current uncertainty about how it operates in test accuracy studies and the interpretation of existing analytical tools such as funnel plots.  We may investigate the effect of the presence of potential conflicts of interest as part of any investigation of heterogeneity.


Appendix 1. MEDLINE search strategy

The MEDLINE search strategy below has been created to optimise sensitivity. The strategy utilises a number of concepts:

Concept A: lines 1 to 21 health condition/s of interest

Concept B: lines 23 to 42 what is being measured by the index test/s/the index test/s

Concept C: lines 44 to 49 method of measurement (i.e. CSF)

The main yield is created by combining A AND B AND C

However, in order to try to capture those records that perhaps do not mention one or more of the three concepts above, some additional combinations were added to the strategy. For example: lines 53 and 54 (which identify records in MEDLINE with the dementia MeSH subheading of diagnosis and those with a subheading of cerebrospinal fluid) were combined with the concept for the index test/s. This approach identified unique records and an examination of the first 50 of these records resulted in two further citations for possible inclusion within the review.

1. exp Dementia/

2. Cognition Disorders/

3. (alzheimer* or dement* or AD or lewy* or VaD or frontotemporal or "vascular cognit* impair*").ti,ab.

4. ((cognit* or memory or cerebr* or mental*) adj3 (declin* or impair* or los* or deteriorat* or degenerat* or complain* or disturb* or disorder*)).ti,ab.

5. (forgetful* or confused or confusion).ti,ab.

6. MCI.ti,ab.

7. ACMI.ti,ab.

8. ARCD.ti,ab.

9. SMC.ti,ab.

10. CIND.ti,ab.

11. BSF.ti,ab.

12. AAMI.ti,ab.

13. LCD.ti,ab.

14. QD.ti,ab.

15. AACD.ti,ab.

16. MNCD.ti,ab.

17. MCD.ti,ab.

18. (nMCI or aMCI or mMCI).ti,ab.

19. ("N-MCI" or "A-MCI" or "M-MCI").ti,ab.

20. "Petersen”.ab.

21. ((CDR adj2 "0.5") or ("clinical dementia rating" adj3 "0.5")).ab.

22.  or/1-21

23. (neurofibril* adj3 tangle*).ti,ab.

24. (neurofilament adj3 protein*).ti,ab.

25. (neuropil adj3 thread*).ti,ab.

26. ((senile or amyloid or neuritic) adj3 plaque*).ti,ab.

27. Neuropil Threads/

28. Senile Plaques/

29. exp Neurofibrils/

30. Neurofilament Proteins/

31. tau Proteins/

32. tau*.ti,ab.

33. hyperphosphorylation.ti,ab.

34. pTau181.ti,ab.

35. tau181.ti,ab.

36. *peptide fragments/cf

37. pTau*.ti,ab.

38. ("t-tau*" or "p-tau*").ti,ab.

39. (innotest or inno-bia or Alzbio3).ti,ab.

40. ((abeta* or ab42 or ab40 or "amyloid-beta" or "beta-amyloid" or "a?42" or "a?40" or "a beta") adj4 (ratio or ratios)).ti,ab.

41. ("phospho-tau*" or "total-tau*").ti,ab.

42. tau231.ti,ab.

43. or/23-42

44. (cerebrospinal fluid* or csf or "spinal fluid*").ti,ab.

45. (blood or plasma).ti,ab.

46. Cerebrospinal Fluid/

47. Blood-Brain Barrier/

48. or/44-47

49. (cf or bl or di or du).fs.

50. or/48-49

51. 50 and 43 and 22

52. exp *Dementia/cf [Cerebrospinal Fluid]

53. exp Dementia/di [Diagnosis]

54. cf.fs.

55. 43 and 53 and 54

56. Cerebrospinal Fluid Proteins/

57. Biological Markers/cf [Cerebrospinal Fluid]

58. or/56,57

59. 58 and 22 and 43

60. or/51,52,55,59

61. (animals not (humans and animals)).sh.

62. 60 not 61

Appendix 2. Assessment of methodological quality table QUADAS-2 tool

DescriptionDescribe methods of patient selection: Describe included patients (prior testing, presentation, intended use of index test and setting)Describe the index test and how it was conducted and interpretedDescribe the reference standard and how it was conducted and interpretedDescribe any patients who did not receive the index test(s) and/or reference standard or who were excluded from the 2x2 table (refer to flow diagram): Describe the time interval and any interventions between index test(s) and reference standard

Signalling questions:


Was a consecutive or random sample of patients enrolled?Were the index test results interpreted without knowledge of the results of the reference standard?Is the reference standard likely to correctly classify the target condition?Was there an appropriate interval between index test(s) and reference standard?
Was a case-control design avoided?If a threshold was used, was it pre-specified?Were the reference standard results interpreted without knowledge of the results of the index test?Did all patients receive a reference standard?
Did the study avoid inappropriate exclusions?Did all patients receive the same reference standard?
Were all patients included in the analysis?
Risk of bias: High/low/unclearCould the selection of patients have introduced bias?Could the conduct or interpretation of the index test have introduced bias?      Could the reference standard, its conduct, or its interpretation have introduced bias?Could the patient flow have introduced bias? 
Concerns regarding applicability: High/low/unclearAre there concerns that the included patients do not match the review question?Are there concerns that the index test, its conduct, or interpretation differ from the review question?Are there concerns that the target condition as defined by the reference standard does not match the review question? 

Appendix 3. Anchoring statements for quality assessment of CSF tau and tau/ABeta ratio biomarkers diagnostic studies

Category Review question Inclusion criteria
PatientsParticipants with mild cognitive impairment, no dementiaParticipants fulfilling the criteria for the clinical diagnosis of MCI at baseline
Index test

CSF tau; CSF p-tau;

CSF tau/ABeta ratio

CSF tau; CSF p-tau;

CSF tau/ABeta ratio

Target condition

Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia)

Any other forms of dementia (conversion from MCI to any other forms of dementia)

Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia) 

Any other forms of dementia (conversion from MCI to any other forms of dementia)

Reference standardNINCDS-ADRDA; DSM; ICD; McKeith criteria; Lund criteria; NINDS-ARIEN criteriaNINCDS-ADRDA; DSM; ICD; McKeith criteria; Lund criteria; NINDS-ARIEN criteria
OutcomeN/AData to construct 2X2 table
Study designN/ALongitudinal cohort studies and nested case-control studies if they incorporate a delayed verification design (case-control nested in cohort studies)

Anchoring statements for quality assessment CSF tau and CSF tau/ABeta ratio

We provide some core anchoring statements for quality assessment of diagnostic test accuracy review of CSF tau and CSF tau/ABeta ratio biomarkers in dementia.  These statements are designed for use with the QUADAS-2 tool and are based on the guidance for quality assessment of diagnostic test accuracy reviews of IQCODE in dementia (Quinn 2012). 

During a two-day, multidisciplinary focus group and the piloting/validation of the guidance, it was clear that certain issues were key to assessing quality, while other issues were important to record but less important for assessing overall quality. To assist, we describe a 'weighting' system.  Where an item is weighted 'high risk' then that section of the QUADAS-2 results table is likely to be scored as high risk of bias.  For example, in dementia diagnostic test accuracy studies, ensuring that clinicians performing dementia assessment are blinded to results of index test is fundamental.  If this blinding was not present then the item on reference standard should be scored 'high risk of bias', regardless of the other contributory elements.

In assessing individual items, the score of 'unclear' should only be given if there is genuine uncertainty.  In these situations, review authors will contact the relevant study teams for additional information.

Anchoring statements to assist with assessment for risk of bias

Patient selection

Was the sampling method appropriate?

Where sampling is used, the designs least likely to cause bias are consecutive sampling or random sampling.  Sampling that is based on volunteers or selecting subjects from a clinic or research resource is prone to bias.

Weighting: High risk of bias (‘no’)

Was a case-control or similar design avoided?

Designs similar to case control that may introduce bias are those designs in which the study team deliberately increase or decrease the proportion of subjects with the target condition, which may not be representative.  For example, a population study may be enriched with extra dementia subjects from a secondary care setting, who are typically more diseased. Some case control methods may already be excluded if they mix subjects from various settings.

Weighting: High risk of bias (‘no’)

Are exclusion criteria described and appropriate?

The study will be automatically graded as unclear if exclusions are not detailed (pending contact with study authors).  Where exclusions are detailed, the study will be graded as 'low risk' if exclusions are felt to be appropriate by the review authors.  Certain exclusions common to many studies of dementia are: medical instability; terminal disease; alcohol/substance misuse; concomitant psychiatric diagnosis; other neurodegenerative condition. Exclusions are not felt to be appropriate if ‘difficult to diagnose’ patients are excluded.

Post hoc and inappropriate exclusions will be labelled 'high risk' of bias.

Weighting: High risk (‘no’)

Index test

Was CSF tau and CSF tau/ABeta ratio biomarkers' assessment/interpretation performed without knowledge of clinical dementia diagnosis?

Terms such as “blinded” or “independently and without knowledge of” are sufficient and full details of the blinding procedure are not required.  Interpretation of the results of the index test may be influenced by knowledge of the results of reference standard. If the index test is always interpreted prior to the reference standard then the person interpreting the index test cannot be aware of the results of the reference standard and so this item could be rated as ‘yes’.

For certain index tests the result is objective and knowledge of reference standard should not influence result, for example level of protein in cerebrospinal fluid, in this instance the quality assessment may be 'low risk' even if blinding was not achieved.

Weighting: High risk (‘no’)

Were CSF tau and CSF tau/ABeta ratio biomarkers’ thresholds pre-specified?

For scales and biomarkers there is often a reference point (in units or categories) above which subjects are classified as 'test positive'; this may be referred to as threshold; clinical cut-off or dichotomisation point.  A study is classified 'high risk of bias' if the authors define the optimal cut-off post-hoc based on their own study data because selecting the threshold to maximise sensitivity and/specificity may lead to overoptimistic measures of test performance.

Certain papers may use an alternative methodology for analysis that does not use thresholds and these papers should be classified as not applicable.

Weighting: High risk (‘no’)

Reference standard

Is the assessment used for clinical diagnosis of dementia acceptable?

Commonly used international criteria to assist with clinical diagnosis of dementia include those detailed in DSM-IV and ICD-10.  Criteria specific to dementia subtypes include but are not limited to NINCDS-ADRDA criteria for Alzheimer’s dementia; McKeith criteria for Lewy Body dementia; Lund criteria for frontotemporal dementias; and the NINDS-AIREN criteria for vascular dementia.  Where the criteria used for assessment is not familiar to the review authors or the Cochrane Dementia and Cognitive Improvement group (‘unclear’) this item should be classified as “high risk of bias”.

Weighting: High risk (‘no’)

Was clinical assessment for dementia performed without knowledge of the CSF tau and CSF tau/ABeta ratio biomarkers?

Terms such as “blinded” or “independently and without knowledge of” are sufficient and full details of the blinding procedure are not required.  Interpretation of the results of the reference standard may be influenced by knowledge of the results of index test.

Weighting: High risk (‘no’)

Patient flow

Was there an appropriate interval between CSF tau and CSF tau/ABeta ratio biomarkers and clinical dementia assessment?

As we test the accuracy of the CSF tau and CSF tau/ABeta ratio biomarkers for MCI conversion to dementia, there will always be a delay between the index test and the reference standard assessments. The time between reference standard and index test will influence the accuracy ( Geslani 2005 ; Okello 2009 ; Visser 2006 ), and therefore we will note time as a separate variable (both within and between studies) and will test its influence on the diagnostic accuracy. We have set a minimum mean time to follow-up assessment of one year. If more than 16% of subjects of subjects have assessment for MCI conversion before nine months this item will score ‘no’.

Weighting: High risk (‘no’)

Did all subjects get the same assessment for dementia regardless of CSF tau and CSF tau/ABeta ratio biomarkers?

There may be scenarios where subjects who score 'test positive' on index test have a more detailed assessment.  Where dementia assessment differs between subjects this should be classified as high risk of bias.

Weighting: High risk (no)

Were all patients who received CSF tau and CSF tau/ABeta ratio biomarkers’ assessment included in the final analysis?

If the number of patients enrolled differs from the number of patients included in the 2X2 table then there is the potential for bias. If patients lost to drop-out differ systematically from those who remain, then estimates of test performance may differ.

If drop-outs these should be accounted for; a maximum proportion of drop-outs to remain 'low risk of bias' has been specified as 20%.

Weighting: High risk (‘no’)

Were missing or uninterpretable CSF tau and CSF tau/ABeta ratio biomarkers results reported?

Where missing or uninterpretable results are reported, and if there is substantial attrition (we have set an arbitrary value of 50% missing data), this should be scored as ‘no’.  If those results are not reported, this should be scored as ‘unclear’ and authors will be contacted.

Weighting: High risk (‘no’ and ‘unclear’)

Anchoring statements to assist with assessment for applicability

Patient selection

Were included patients representative of the general population of interest?

The included patients should match the intended population as described in the review question.  The review authors should consider population in terms of: symptoms; pre-testing; potential disease prevalence; setting.

If there is a clear ground for suspecting an unrepresentative spectrum the item should be rated 'poor applicability'.

Index test

Were sufficient data on CSF tau and CSF tau/ABeta ratio biomarkers’ application given for the test to be repeated in an independent study?

Variation in technology, test execution, and test interpretation may affect estimate of accuracy. In addition, the background, and training/expertise of the assessor should be reported and taken in consideration. If CSF tau and CSF tau/ABeta ratio biomarkers were not performed consistently this item should be rated 'poor applicability'.

Reference standard

Was clinical diagnosis of dementia made in a manner similar to current clinical practice?

For many reviews, inclusion criteria and assessment for risk of bias will already have assessed the dementia diagnosis.  For certain reviews an applicability statement relating to reference standard may not be applicable.  There is the possibility that a form of dementia assessment, although valid, may diagnose a far larger proportion of subjects with disease than usual clinical practice.  In this instance the item should be rated 'poor applicability'.

Contributions of authors

All authors contributed to the drafting of the protocol.

Declarations of interest

None known