Guidance for the preparation of neurological management guidelines by EFNS scientific task forces – revised recommendations 2012

Authors


Correspondence: M. A. Leone, Head and Neck Department, SCDU Neurology, Ospedale ‘Maggiore della Carità’, C.so Mazzini, 18 - 28100 Novara, Italy (tel.: +0321/3733.218; fax: +0321/3733.298; e-mail: maurizio.leone@maggioreosp.novara.it).

Abstract

This paper is meant to provide guidance to anyone wishing to write a neurological guideline for diagnosis or treatment, and is directed at the Scientist Panels and task forces of the European Federation of Neurological Societies (EFNS). It substitutes the previous guidance paper from 2004. It contains several new aspects: the guidance is now based on a change of the grading system for evidence and for the resulting recommendations, and has adopted The Grading of Recommendations, Assessment, Development and Evaluation system (GRADE). The process of grading the quality of evidence and strength of recommendations can now be improved and made more transparent. The task forces embarking on the development of a guideline must now make clearer and more transparent choices about outcomes considered most relevant when searching the literature and evaluating their findings. Thus, the outcomes chosen will be more critical, more patient-oriented and easier to translate into simple recommendations. This paper also provides updated practical recommendations for planning a guideline task force within the framework of the EFNS. Finally, this paper hopes to find the approval also by the relevant bodies of our future organization, the European Academy of Neurology.

Introduction

EFNS guidelines

The European Federation of Neurological Societies (EFNS) launched its guidelines program in 1997 [1]. Since then, more than 70 guidelines have been published in the European Journal of Neurology, and collected in two books [2, 3]. Recommendations in previous EFNS guidelines were graded on three levels. Quality of evidence of studies was classified into four levels according to a first guidance paper [1], whose definitions were later modified to make them similar to the American Academy of Neurology guidelines [4].

In the EFNS grading system, as in others, the recommendation rating was derived almost automatically from the type of study design, and it often remained unclear how panellists moved from a given level of evidence to a certain recommendation. Grading systems that are entirely based on study design are rather simplistic. They do not consider the consistency of effects across studies, the uncertainty around the results or the size of the effect. Randomized controlled clinical trials (RCTs) in the EFNS grading system were placed at the highest level of quality; although the RCT remains the gold standard study design to evaluate the efficacy of an intervention, questions regarding harms, or the diagnosis and prognosis of disease may be best evaluated with other study designs [5, 6]. Observational studies are frequently used in neurology, and this peculiarity needs to be considered when evaluating the opportunity of a system for grading evidence about neurological diseases.

The GRADE approach

The Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group was established in 2000, including a large number of researchers, health professionals and methodologists, with the aim to create a homogeneous and transparent approach to guideline creation and reporting [7]. After conducting a review of existing grading systems, the GRADE Group developed a system for grading the quality of evidence and strength of recommendations that addresses the disadvantages of previous systems. GRADE has received widespread international acceptance, and now includes a wide range of professional and international societies, medical journals and healthcare regulatory authorities, such as the World Health Organization, the Cochrane Collaboration, the Scottish Intercollegiate Guidelines Network, the National Institute for Health and Clinical Excellence, and the British Medical Journal. An updated list of Scientific Societies already adopting GRADE is available at http://www.gradeworkinggroup.org. The GRADE working group offers support that is not available for other grading systems, including a website and specific software. Other current resources developed by the GRADE group include the original series of articles for guideline developers published in the British Medical Journal [7-12], and an ongoing series of articles in the Journal of Clinical Epidemiology [13-24].

The GRADE system is based on a sequential assessment of the quality of evidence, followed by assessment of the balance between advantages and disadvantages, and finally judgment about the strength of recommendations. Recommendations are graded into only two categories, strong and weak, either for or against a clinical decision. Contrary to other grading systems, the whole process is explicit and transparent.

Strengths and limitations of GRADE

The major advantage of GRADE is that guideline developers must declare how agreement or disagreement is reached for each step involving a judgment. The GRADE approach has also several other strengths: (i) it provides a structured format that helps guidelines users to easily follow the whole process of guideline development; (ii) it separates the judgments regarding the quality of evidence from those about the strength of recommendations; (iii) it is simple, providing only two categories of grading recommendations (strong/weak) in two directions (for/against); (iv) it highlights the importance of systematic reviews, incorporating summary effect estimates from meta-analyses; (v) it ranks the clinical importance of different outcomes; (vi) it allows the option of making different choices when clinicians have information that leads them to disagree with the judgments and have evidence that the values of their patients differ; and (vii) it applies to diagnostic as well as therapeutic questions.

GRADE also has some disadvantages, as follows. (i) Like other grading systems, it has not been tested for validity. (ii) The reproducibility of agreement about the quality of evidence and the balance of benefits and harms is highly variable [25]. However, lack of reproducibility is of less concern if judgments are made transparent and traceable, as in GRADE. GRADE distinguishes the collected evidence from its interpretation by experts. (iii) GRADE requires more resources to conduct detailed assessment of the evidence, such as systematic reviews.

Aims of this guidance

The EFNS has now chosen GRADE as the preferred method for rating the quality of evidence and strength of recommendations, consistent with other leading organizations. The aim of this paper is to provide guidance for members of EFNS Scientific Panels and other neurologists, health care professionals and health care providers developing guidelines about treatment and diagnosis of neurological diseases. It provides the view of an expert task force appointed by the Scientific Committee of the EFNS, and represents a peer-reviewed statement of standards for guidelines published by EFNS. It aims to improve the quality of EFNS guidelines, making the path from a given level of evidence to a certain recommendation explicit, transparent and uniform. This will enhance the applicability and usefulness of practice guidelines for everyday practice. It is not intended that guidelines should have legally binding implications in individual cases. Clinicians, patients, third-party payers, institutional review committees or the courts should never view recommendations as requirements. Even strong recommendations based on high-quality evidence will not apply to all circumstances and all patients.

This guidance is intended also for diagnostic guidelines. EFNS acknowledges that some reasons could restrain the production of diagnostic guidelines in the next few years: the GRADE methodology for diagnostic guidelines has not yet been completely finalized, only a handful of Cochrane diagnostic reviews and other diagnostic meta-analyses have been published so far, and RCTs evaluating the impact of diagnostic test or strategies are rare in neurology. However, we expect that this decision fosters the planning of well-designed observational and experimental diagnostic studies in neurological diseases.

Concepts of grade

The clinical question

Just as in any well-conducted research study, a guideline should address well-designed clinical questions that the recommendations should answer. Usually, one clinical question corresponds to one recommendation. Each clinical question should contain the four components known by the acronym ‘PICO’: Patients; Intervention; Comparison; and the Outcome(s) of interest, both beneficial and harmful. For example, ‘Should patients with a first generalized tonic-clonic seizure (Patients) be treated immediately with antiepileptic drugs (Intervention) or should the treatment be deferred to the second seizure (Comparison) to achieve a 2-year seizure-free period (Outcome)?’ When many outcomes are possible for each clinical question, the GRADE approach asks panellists to make explicit judgments about the importance of each outcome for making a recommendation. GRADE suggests a nine-point scale to judge importance [15]. Scores 7–9 identify outcomes of critical importance for decision-making. Ratings of 4–6 represent outcomes that are important but not critical. Ratings of 1–3 are items of limited importance. Because ranking outcomes involves a certain amount of subjectivity and yet is crucial for the decision-making process, panellists should decide a priori which outcomes are critical and patient-oriented. If evidence is lacking for an outcome that panellists consider important, this should be acknowledged, rather than ignored.

Finding the evidence

Every clinical question should be answered based on one or more systematic reviews of the best available evidence. Panellists can either conduct the systematic review themselves or identify an existing high-quality systematic review. This systematic review will serve to create a summary of available evidence. The systematic review should identify RCTs first. However, for many therapeutic options, there is little randomized evidence and non-randomized observational studies (cohort, case–control, before–after and series of cases) also have to be considered. Such papers can be identified from several bibliographic databases. Priority must be given to Medline, EMBASE and the Cochrane Library. Other sources of data include national or regional databases (e.g. LILACS), specialized databases (e.g. PSYC-INFO), clinical trials registers, Health Technology Assessment Agencies, and previously published guidelines (see http://www.cochrane.org/handbook/chapter-6-searching-studies for a comprehensive list of sources). The process of searching relevant data must be explained in the guideline in order to be reproducible.

Grading the quality of evidence for each important outcome

The study design remains crucial in GRADE to determine our confidence in the estimates of benefit or harm. GRADE classifies the quality of evidence in one of four levels – high, moderate, low and very low (Table 1). Differently from previous rating systems, the GRADE approach does not rate every single article, but rates the overall underlying literature for each important outcome. The study design determines the starting level that is modified according to several factors. Thus, if the underlying literature consists of one or more RCTs, the starting level is high quality. In contrast, if only one or more observational studies are available, the starting level of quality will be low. Thereafter, the level for the quality of evidence for each outcome is reduced (downgraded) or raised (upgraded) according to several factors (Table 2). These factors are regarded overall, i.e. for all studies. GRADE focuses on assessment of outcomes across studies rather than assessment of studies across outcomes. Five factors can downgrade the quality of evidence: study limitations; inconsistency; imprecision; indirectness; and publication bias. On the other hand, three factors can upgrade the quality of evidence: large treatment effect; a dose–response gradient; or likely reduction of the effectiveness of an intervention by unmeasured confounders. Although theoretically these three factors for upgrading the quality of evidence may apply to RCTs, they are especially applicable to observational studies. A series of articles in the Journal of Clinical Epidemiology by the GRADE group specifically deal with each of these factors and will be referred to in each appropriate section of this guidance [13]. As a conclusion of this process the final evidence from RCT may be graded downwards as very low, and evidence from observational studies upwards as high.

Table 1. The quality of evidence according to GRADE
RankExplanationExamplesSymbol
High qualityWe are very confident that the true effect lies close to that of the estimate of the effect.

Randomized trials without serious limitations

Well-performed observational studies with very large effects (or other qualifying factors)

θθθθ
Moderate qualityWe are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that is substantially different.

Randomized trials with serious limitations

Well-performed observational studies yielding large effects

θθθ
Low qualityOur confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.

Randomized trials with very serious limitations

Observational studies without special strengthsor important limitations

θθ
Very low qualityWe have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of the effect.

Randomized trials with very serious limitations and inconsistent results

Observational studies with serious limitations (e.g. case series or case reports)

θ
Table 2. The process of grading quality of evidence according to GRADEThumbnail image of

Factors that decrease the quality of evidence

The following five factors may decrease the quality of evidence supporting a recommendation.

  1. Study limitation (risk of bias) [17]. If studies suffer from major limitations, this may lead to a biased assessment of the treatment effect. Study limitations in RCTs are: lack of allocation concealment; lack of blinding; incomplete accounting of patients and outcome events; selective outcome reporting; and other biases (early interruption for benefit, use of non-validated outcome measures, carryover effect and recruitment bias). Study limitations in observational studies are: failure to include a control population; flawed measurement of exposures and outcomes; failure to adequately control confounding and incomplete follow-up.
  2. Inconsistency of results [20]. Studies are heterogeneous when they yield widely differing estimates of the treatment effect. Different criteria for assessing consistency are available. If investigators cannot find reasons that explain heterogeneity (e.g. different inclusion criteria, different doses, different duration of the follow-up) and inconsistency is large and unexplained, rating down quality for inconsistency is appropriate, particularly if some studies suggest substantial benefit, and others no effect or harm (rather than only large versus small effects).
  3. Imprecision [19]. Imprecision appears when studies include relatively few patients and few events, with wide confidence intervals for effect estimates. Furthermore, if the overall estimate includes benefits and harms (i.e. the range of the confidence interval) the quality of evidence will also be judged lower.
  4. Indirectness [21]. There are two types of indirectness of evidence. The first type includes differences between the population, intervention, comparator to the intervention and outcome of interest, and those included in the relevant studies. The second type occurs when considering, for example, use of one of two active drugs that have not been tested head-to-head. Randomized trials may have compared one drug with placebo and the other with placebo, thus allowing only indirect comparisons of the magnitude of effect of both drugs. For example, in multiple sclerosis, many therapeutic strategies are available, still their relative effectiveness and safety is uncertain due to the limited number of trials directly comparing each strategy.
  5. Publication bias [18]. Publication bias exists if investigators fail to report studies, usually those with no significant results. A simple method for deciding if publication bias exists is the so-called funnel plot in which the effect estimates are plotted with their corresponding standard errors. There is no publication bias if the estimates for all the trials lie symmetrically around the overall effect estimate.

Factors that increase the quality of evidence

The following three factors [22] may increase the quality of evidence supporting a recommendation (Table 2).

  1. Large treatment effect. This is the most common reason for upgrading the quality of evidence. When methodologically strong observational studies yield large or very large and consistent estimates of the magnitude of a treatment effect, we may be more confident about the results.
  2. Dose–response gradient. The greater the quantity or duration of a beneficial exposure is, the more likely a desirable effect and the less likely an adverse event will occur.
  3. Confounding. If all plausible confounders and biases unaccounted for in the adjusted analysis of a rigorous observational study resulted in an underestimate of an apparent treatment effect, the confidence in the estimated effects would be increased.

Grading the overall quality of evidence

After the process described above, panellists end with a defined grade of the quality of evidence for each important outcome. However, each clinical question may have many important outcomes, and panellists should usually provide a single rating of quality of evidence for every recommendation. This is a quite simple decision when the quality of evidence is similar across different outcomes, but difficult when the quality of evidence differs across important outcomes. The decision involves potentially subjective decision-making, regarding the relative weight of each outcome, especially in the instance of balance between outcomes addressing benefit and those addressing harm (adverse effects). For this reason, critical outcomes should be decided in the protocol, and the process must be clearly explained in the guideline (e.g. the panel placed the highest value on …, and less value on …). According to GRADE, the final overall quality of evidence grade for each clinical question should be based on the grade for the outcome with the lowest quality of evidence, if the outcome is crucial to making a decision [23].

Determining the direction and strength of a recommendation

The last step for panellists is going from evidence to recommendations. Direction means to set a recommendation ‘for’ or ‘against’. Strength of recommendation in the GRADE system has only two levels, ‘strong’ or ‘weak’. GRADE recognizes four determinants of direction and strength of recommendation: (i) balance between desirable and undesirable effects; (ii) quality of evidence; (iii) values and preferences; and (iv) costs (Table 3). The balance between desirable and undesirable outcomes determines the direction of the recommendation, and this factor, along with the quality of the evidence, determines the strength of the recommendation. Both direction and strength may be modified after taking into account patients’ values and preferences, and the costs of the alternative strategies. This is again a process prone to a large amount of subjectivity; GRADE developers provide indications on how to make the process structured and transparent.

Table 3. Determinants of strength of recommendation, according to GRADE
FactorComment
Quality of evidenceStrong recommendations usually require higher quality evidence for all the critical outcomes. The lower the quality of evidence, the less likely is a strong recommendation.
Balance between desirable and undesirable effectsPanellists should make stronger recommendations for interventions that influence outcomes with high patient importance. If the baseline risk is different among different populations, they should made separate recommendations. The larger the difference between the desirable and undesirable effects, the higher the likelihood that a strong recommendation is warranted.
Values and preferencesThe more values and preferences vary, or the greater the uncertainty in values and preferences, the higher the likelihood that a weak recommendation is warranted.
Costs (resource allocation)The higher the incremental cost, all else being equal, the less likely that the recommendation in favor of an intervention is strong.

Guideline authors must firstly consider the direction of recommendations. The clinicians' goal is to achieve the greatest benefit with the lowest harm; this implies a judgment of the balance between desirable and undesirable effects. The desirable consequences of using a particular therapeutic approach include reduced mortality and morbidity, improved quality of life, and lower rate of complications. The undesirable consequences include increased mortality and morbidity, adverse effects, and burden. Burdens are the demands of adhering to a recommendation that patients or caregivers may dislike, such as having to take medication or the inconvenience of frequent follow-up visits. When the desirable consequences of following a preventive or therapeutic option outweigh the undesirable ones, panellists would recommend that option. On the other hand, when the undesirable consequences outweigh the desirable ones, panellists would recommend against that option.

Secondly, panellists should define the strength of a recommendation; this reflects the degree of confidence that the desirable effects outweigh the undesirable ones. Quality of evidence is its first and principal determinant. The higher the quality of evidence, the higher the likelihood that a strong recommendation is warranted. However, high-quality evidence does not always imply strong recommendations, and a strong recommendation can arise from low-quality evidence. Balance between desirable and undesirable effects is the second determinant of the strength of a recommendation. When advantages or disadvantages are clear the recommendation may be strong. When advantages and disadvantages are closely balanced, or appreciable uncertainty exists about their magnitude, a weak recommendation becomes appropriate. The third determinant of the strength of recommendation is uncertainty about, or variability in, values and preferences. Patients may attribute different importance to different outcomes and sometimes their evaluation may differ from that of their doctors [26]. On the other hand, clinicians are becoming increasingly aware of the importance of individualized clinical decision-making, especially when the desirable and undesirable consequences of alternative management options are closely balanced, or uncertain. In such instances, panellists will offer weak recommendations. Because studies on values and preference estimates are often unavailable, there is a great amount of subjectivity in this decision. For this reason panellists should make their choices explicit (e.g. ‘in recommending against this …, we are placing a low value on the potential benefits of the drug, and a high value on avoiding its adverse effects …’). The fourth determinant of the strength of a recommendation is cost. The higher the cost of an intervention, the lower the likelihood that a strong recommendation is warranted. The EFNS goal is to provide guidelines that are potentially usable in all European countries. Cost is much more variable in different geographical areas than other outcomes [27], and the availability of resources varies widely across Europe.

Reaching consensus in different steps of the guideline production

There are many instances during GRADE guideline production where panellists have to make decisions, including ranking of outcome importance, judging the overall quality of studies, direction of the recommendation, weighting the strength of a recommendation, evaluating patients' values and preferences and their impact on recommendations, and evaluating costs. In all these situations, the risk is that panellists' judgment is influenced by subjective and highly variable aspects, such as strongly held personal opinions or dominance of one or more panel members. When panellists do not agree, GRADE suggests two alternative ways to reach consensus [12]: the Delphi method; or the nominal group technique, either with ranking or voting. To make this process transparent and reproducible, panellists should clearly establish the question, and establish measurement instrument [12]. Once consensus is reached, panellists should make the assumptions about their decisions explicit. If disagreement still exists, its nature and extent should be accounted for in the guideline and explained.

Interpreting the strength of recommendations

Table 4 summarizes several ways that developers and consumers of guidelines can interpret strong and weak recommendations. The strength of recommendation indicates ‘the extent of the grader's confidence that adherence to the recommendation will do more good than harm’.

Table 4. Implications of strong and weak recommendations for different groups of guideline users
Target groupStrong recommendationsWeak recommendations
PatientsMost individuals in this situation would want the recommended course of action and only a small proportion would not.The majority of individuals in this situation would want the recommended course of action, but many would not.
CliniciansMost patients should receive the recommended course of action. Formal decision aids are not likely to be needed to help individuals make decisions consistent with their values and preferences.Different choices will be appropriate for different patients. Doctors must make greater effort to help each patient to arrive at a management decision consistent with his or her values and preferences; decision aids and shared decision-making are particularly useful [28].
Policy makersThe recommendation can be adopted as a policy in most situations.Policy making will require substantial debate and involvement of many stakeholders.
Use of the recommendation as a quality criterion or performance indicatorYes.Performance could be measured by monitoring whether clinicians have discussed recommended actions with patients or their surrogates, or carefully documented the evaluation of benefits and harms in the patient's chart.
Implications for researchFurther research is unlikely to change confidence in the estimate.Further research is likely to change confidence in the estimate.

Recommendations to use interventions in the context of research

On occasion, panellists may face questions about the use of therapeutic interventions associated with potentially appreciable benefits or harm, but with insufficient evidence to support an informed decision. In such case, they might choose not to make any recommendation. In these situations, panellists may recommend that further research is carried out, and even suggest future lines of research.

Practical recommendations for the process of preparing a guideline according to GRADE

A step-by-step checklist for the process of preparing a guideline according to GRADE follows.

  • Step 1. Define and prioritize the key clinical questions. Each question must clearly state the four PICO components: Patient; Intervention; Comparison; and Outcomes.
  • Step 2.Identify all the important clinical outcomes, including harms, for each clinical question. The maximum number of outcomes for each question is seven. Rank the relative importance of each outcome differentiating the critical from the important but not critical. Explain your choice. Very low-rated outcomes may not be considered in a recommendation.
  • Step 3. Either identify one or more existing high-quality systematic reviews or conduct a systematic review. In this case, at least MEDLINE, EMBASE and the Cochrane Library should be searched. Other sources may also be searched, but gray literature does not need to be searched. Report the complete search string(s) for each database, using specific and predictive keywords as well as their combinations. Identify the best available evidence, starting with RCTs, cohort studies, case–control studies, cross-sectional studies and ecological studies, etc. It is always necessary to collect the data from the paper itself, not from secondary literature. The full paper should always be read, not only the abstract. Data can be included from papers that have been accepted but not yet published. Unpublished data from randomized trials can be used provided they are of high quality. Reviewers may also contact the authors to get more details on single studies. Such exceptions should be explained in the search section of the report. Non-systematic reviews need not be included in the work of the task force. If they are referred to, their conclusions should never be used without independently evaluating the primary literature. It is appropriate to retrieve and discuss previous systematic reviews, guidelines (www.guidelines.gov) and health technology assessments (www.inahta.org). The guideline may or may not include a meta-analysis. If the task force decides to include a meta-analysis in its guideline, this should be done according to the Cochrane methods, using the Revman software (http://ims.cochrane.org/revman/download).
  • Step 4. Assess the quality of studies in an evidence profile. Summarize the relevant evidence in a summary of findings table, including each relevant outcome [16]. An application called GRADE Profiler (GRADEpro) has been developed to assist in the process of grading quality and summarizing evidence (http://ims.cochrane.org/revman/gradepro). Additional information on the GRADE methodology and tutorials can be found at the GRADE website (http://www.gradeworkinggroup.org).
  • Step 5. Grade the quality of evidence for each relevant outcome. This does not mean that the quality of each study is to be graded, but to grade the aggregate of studies that look at the effect of an intervention on one type of outcome. The GRADE levels for the quality of the evidence are ‘high’, ‘moderate’, ‘low’ or ‘very low’. GRADE's approach begins with the study design. If randomized trials are available, the starting level is ‘high quality’; if no randomized trials are available but observational studies are available, the starting level is ‘low quality’.
  • Step 6.Consider the five factors that may reduce grading of RCTs from high to moderate, low or very low, and of observational studies from ‘low’ to ‘very low’: (i) study limitations (risk of bias); (ii) inconsistencies between studies; (iii) indirectness of evidence; (iv) imprecision in estimates; (v) a high probability of reporting bias.
  • Step 7.Consider the three factors that may raise grading of observational studies from ‘low’ to ‘moderate’ or ‘high’: (i) large or very large and consistent estimates of a treatment effect; (ii) the presence of a dose–response gradient; (iii) a situation in which all plausible biases would decrease the magnitude of the effect. Consideration of all the criteria for downgrading the estimate of the quality of evidence must precede consideration of reasons for upgrading the estimate quality.
  • Step 8. Grade the overall quality of the evidence for each clinical question. When many outcomes are possible for a clinical question, the grade for the overall quality of evidence is based on the grade for the outcome with the lowest quality of evidence, if that outcome is critical. Thus, critical outcomes determine the rating of quality of evidence across outcomes.
  • Step 9. Determine the direction and strength of a recommendation. Quality of evidence is only one of the four key factors determining the strength of a recommendation, according to GRADE. The others are the magnitude of the difference between the desirable and undesirable consequences, the certainty about values and preferences of patients, and the resource expenditure associated with the compared management options. Always explain your choices. Direction of a recommendation is either ‘for’ or ‘against’. A recommendation is graded either ‘strong’ (i.e. ‘We recommend …’ for a positive recommendation or ‘We do not recommend …’ for a negative recommendation) or ‘weak’ (‘We suggest …’ or ‘We do not suggest …’). On occasion, to avoid making statements about what should not be done (e.g. ‘we recommend that treatment A is not used’), they may recommend an alternative option stating what should be done (e.g. ‘we recommend that treatment B is used rather than treatment A’).
  • Step 10. Repeat steps 1–9 for each clinical question.

Practical recommendations for the process of proposing, planning and writing a EFNS guideline*

*A flow-chart of the entire process is available online.

  • 1.Neurological Management Guidelines will be produced by task forces appointed by the Scientific Committee. Proposals for task forces concerning neurological management should be submitted to the Scientific Committee as a guideline protocol. The protocol should include the title, objectives, membership, conflict of interests, a short (100–300 words) explanation as to why the guideline is needed, already existing guidelines on the same or related topic, search strategy, method for reaching consensus and time frame for accomplishment. The protocol should define the clinical questions and outcomes to be considered. Task forces will usually be appointed following a proposal from the chairperson of a Scientific Panel to the Scientific Committee. A template for a protocol is provided online. The task force will consist of a chairperson and at least six but not usually more than 12 members. No more than two members should usually come from any one country. Conflicts of interest must be declared by members at the time of the formation of the task force. The chairperson should be free from conflicts of interest. If feasible, the group should include a patient advocate (normally an officer from a European patient organization if the task force deals with a clinically relevant topic), and other relevant specialists and health professionals. If task forces have a budget, they must nominate a secretary and treasurer, and submit an annual account to the Management Committee.
  • 2.The task force should identify in the protocol which tasks will be undertaken by each member, including those who will search the literature, prepare the evidence tables and grade the evidence, and prepare the first draft of the guideline. The task force may apply to the EFNS for support to the authors for these tasks.
  • 3.Irreconcilable differences between group members should be referred to the Scientific Committee through its chairman.
  • 4.The task force should submit the protocol to the chairperson of the Scientific Committee for approval. The chairperson will have the protocol reviewed by the Scientific Committee and the Management Committee and advice the task force within four weeks whether it is accepted as the basis for the guideline.
  • 5.EFNS encourages the production of guidelines in agreement with other scientific societies. In this case EFNS allows simultaneous publication in the EFNS journal and in the journal of the collaborating organization, provided that papers are published at the same time.
  • 6.The guideline should be based on a systematic review of the evidence. If this does not exist the panellists must prepare a systematic review specifically for the guideline.
  • 7.The levels of recommendation should be based on the GRADE system.
  • 8.The guidelines format will be that for the European Journal of Neurology, following a template with the following sections.
    1. Title. This should read: EFNS Guideline on … Report of EFNS task force on … (title of task force, if different from the topic of the guideline).
    2. Structured abstract containing the main conclusions.
    3. Membership of task force.
    4. Objectives.
    5. Background.
    6. Search strategy.
    7. Method for reaching consensus.
    8. Results.
    9. Recommendations.
    10. Statement of the time when the guidelines will likely need to be updated.
    11. Conflicts of interest.
    12. References.
    13. Online material (e.g. summary of findings tables).
  • 9.The length of the guideline report should be considered. While an ideal length is up to 6000 words, it must be acknowledged that many guidelines deal with a number of questions and therefore need more space. In view of publication in the European Journal of Neurology, additional files can be submitted for online-only publication section. Also, supplementary material may be published on the EFNS website. The authors will be the EFNS task force on management/diagnosis/other of condition. The authors will be listed as members of the task force with the chairman first, and the other authors in alphabetical order. The task force should submit the completed guideline for approval to the chairperson of the Scientific Committee.
  • 10.The Scientific Committee will have the proposed management guideline reviewed by its members, the EFNS Management Committee and the chairpersons of any Scientist Panels that might be affected by the guidelines although not involved in their preparation. External peer reviewing will be sought from content and methodological experts. Within 8 weeks from submission, the chairperson of the Scientific Committee will notify the chairperson of the task force whether the guidelines have been accepted as the official guidelines of the EFNS or not. If revision is needed, the task force will prepare a revised version and submit this to the review process again, highlighting the revisions and documenting the responses to each of the referees' comments.
  • 11.Following approval, the guideline will be submitted by the chairperson of the task force to the editor/s of the European Journal of Neurology for publication. The editor will have the power to accept or reject the guidelines for publication, and may make minor editorial changes.
  • 12.The validity of published guidelines will be reviewed regularly by the chairpersons of the task force and the relevant Scientist Panel at least every 2 years. Guidelines will be published on the EFNS website and in the European Journal of Neurology.
  • 13.National societies will be encouraged to translate guidelines for dissemination in their own countries. Guidelines may be translated and published in local language journals.
  • 14.A list of the EFNS guidelines under preparation will be placed on the EFNS website.
  • 15.Guidelines whose quality level does not qualify for EFNS guidelines could be considered as an EFNS consensus review. This might especially apply to diagnostic guidelines, until more experience is gained on the application of GRADE system.

Acknowledgements

The authors thank Julia Scheidl from the EFNS Guideline Production Group and Lisa Müller from the EFNS Head Office for their indispensable assistance. Gratitude also goes to the EFNS members who critically reviewed the manuscript: A. Schapira, N.E. Gilhus, T. Kyriakides, L. Csiba and R.A.C. Hughes.

Conflict of interest statement

The authors have no conflict of interest related to this paper.

Ancillary