Classification criteria for rheumatic diseases: Why and how?
Version of Record online: 28 SEP 2007
Copyright © 2007 by the American College of Rheumatology
Arthritis Care & Research
Volume 57, Issue 7, pages 1112–1115, 15 October 2007
How to Cite
Dougados, M. and Gossec, L. (2007), Classification criteria for rheumatic diseases: Why and how?. Arthritis & Rheumatism, 57: 1112–1115. doi: 10.1002/art.23015
- Issue online: 28 SEP 2007
- Version of Record online: 28 SEP 2007
- Manuscript Accepted: 15 JUN 2007
- Manuscript Received: 2 MAY 2007
In the current issue of Arthritis Care & Research, Johnson et al report the results of a systematic literature research on the main psychometric properties of classification criteria for rheumatic diseases (1). A previous editorial (2), written by members of the American College of Rheumatology (ACR) Classification and Response Criteria Subcommittee of the Committee on Quality Measures, emphasized the main steps to follow when proposing classification and/or responder criteria, and in their article, Johnson et al give useful additional information on this topic.
The main scientific societies in rheumatology, such as the ACR and the European League Against Rheumatism (EULAR), strongly support research in the field of classification or responder criteria and have promoted appropriate standing committees for this purpose. Within EULAR, there are 2 committees dealing with criteria: the Standing Committee of Epidemiology, and the Standing Committee for International Studies Including Clinical Trials. Within the ACR, the Committee on Quality Measures is in charge of this aspect.
The role of the ACR and EULAR in defining criteria is to promote procedures that are useful for optimizing the collaboration between experts in a rheumatologic field of interest (e.g., rheumatoid arthritis, ankylosing spondylitis, gout, osteoarthritis, etc.) and experts in the field of clinical epidemiology (3). Most experts in a specific field of rheumatology are not also experts in the field of clinical epidemiology and, therefore, might not be the appropriate people to plan, conduct, and analyze studies in order to propose criteria for rheumatic diseases. Similarly, most experts in clinical epidemiology are not also experts in specific topics of rheumatic disorders. There is a need to combine their expertise. Therefore, the role of the appropriate EULAR and ACR committee is to help colleagues who are experts in a specific field and are planning to propose criteria by providing them with expertise in the field of clinical epidemiology.
Such expertise can be provided by obtaining a critical review of applications by the appropriate committee, or by proposing a clinical epidemiologist to be part of the task force aimed at proposing criteria. In the latter case, which is the current EULAR procedure for the elaboration of recommendations (3), it is mandatory for each task force to propose both a convener, who is usually the expert in the field of research, and a clinical epidemiologist, who may be inexperienced in the specific disease. For example, Bernard Combe and Robert Landewé acted as the convener and clinical epidemiologist, respectively, for the EULAR recommendations for management of early arthritis (4).
A basis for recommendations and procedures proposed by the ACR Classification and Response Criteria Subcommittee of the Committee on Quality Measures with regard to the elaboration and validation steps of criteria for rheumatic diseases has been explicitly described in the article by Johnson et al (1). It should be emphasized that due to the screening technique Johnson et al used in selecting sets of criteria to assess, their review is not exhaustive of all existing sets of criteria, but rather is aimed at raising awareness of methodologic issues regarding diagnostic and classification criteria. This article is a basis for further research in the field of criteria (i.e., it can be used as a future work agenda). We strongly recommend reading their Methods section carefully, as we feel that it reflects the suggestions of the ACR Quality Measures Committee, which were presented in an earlier editorial in Arthritis Care & Research (2) and can now be further detailed.
Points to Consider for Elaboration of Criteria
Researchers interested in criteria development may use the article by Johnson et al as a basis for proposing procedures. The article can be used as a guide for future researchers aiming to produce a set of criteria. We would like to emphasize some points from Johnson et al that may be of interest to the clinician who wants to propose either a diagnostic or classification set of criteria for a specific rheumatic disease. These points are summarized in Table 1 and described below. The objective is to create a link between the methodologist or the statistician and the clinician; we have tried to clarify each issue with the clinician in mind, using practical examples.
|1. What is the exact purpose of the set of criteria?||A. Classification: to differentiate and optimally classify one rheumatic disease from another in order to conduct clinical research|
|B. Classification: to differentiate a rheumatic disease from the total population in order to conduct epidemiologic studies|
|C. Both of the above|
|D. Diagnosis: to differentiate one rheumatic disease from another in clinical practice|
|2. How have the items (potential candidate criteria to include in the set of criteria) been selected?||A. Literature search|
|B. Expert opinion|
|C. Combination of literature search and expert opinion|
|D. Patient opinion|
|3. Has each potential candidate criterion been assessed regarding psychometric properties?||Psychometric properties include face validity, content validity, construct validity, reliability, precision, feasibility, and discriminant capacity|
|4. Derivation study: is there a patient-based study to derive the composite criteria? If so, what is the design of this study aimed at elaborating the set of criteria?||A. Cross-sectional study|
|B. Retrospective study|
|C. Longitudinal prospective study|
|5. Derivation sample: what is the origin of the patients?||A. Academic rheumatology practice|
|B. Community rheumatology practice|
|C. Nonrheumatology practice|
|6. Derivation sample: what is the origin of the control subjects?||A. Patients with rheumatic symptoms|
|B. Patients with rheumatic disease|
|C. Patients with nonrheumatic disease|
|D. Healthy participants|
|7. Derivation sample: who are the clinicians involved, i.e., providing cases and controls, and assessing the gold standard?||A. Same clinicians as the ones proposing the criteria|
|B. Completely different clinicians from the ones proposing the criteria|
|C. Overlapping group|
|8. Derivation study: what is the gold standard?||A. Validated set of criteria|
|B. Expert opinion|
|9. Which statistical technique is used to select the items to be included in the set of criteria?||A. Sensitivity and specificity|
|B. Receiver operating characteristic curves|
|C. Regression models (See text for other potential techniques)|
|10. Which presentation is chosen for the set of criteria?||A. Tree|
|C. Weighted list|
|11. Has the proposed set of criteria been assessed regarding psychometric properties?||Psychometric properties include face validity, content validity, construct validity, reliability, precision, feasibility, and discriminant capacity|
|12. External validation: is there an external validation of the proposed set of criteria? If so, how?||See items 4–8 above|
What is the purpose of the criteria set?.
The purpose for which a criteria set will be used should be explicitly stated. Criteria can be developed for a diagnostic situation (i.e., to help the clinician reach a diagnosis when confronted with a given patient in an outpatient clinic), or for classification, which is more common. Classification can concern clinical research in the field of rheumatology (where a patient with rheumatic disease must be classified among other patients in a rheumatology outpatient clinic) or epidemiologic studies (where a patient with a rheumatic disease must be classified from a large population comprising healthy subjects).
Diagnostic criteria (i.e., criteria developed for diagnostic purposes in the clinic) are completely different from classification criteria in terms of conception. The study design, choice of populations, and gold standard used for diagnostic criteria are different than those used for classification criteria.
It should be noted that many sets of criteria in rheumatology were developed as classification criteria for clinical research but are widely used as diagnostic criteria. This is, for example, the case with the ACR (formerly the American Rheumatism Association) criteria for the classification of rheumatoid arthritis (5) and the European Spondylarthropathy Study Group (ESSG) preliminary criteria for the classification of spondylarthropathy (6).
Finally, the conceptual framework of the disease (i.e., will the criteria explore all facets of the disease?) should be stated. This is particularly important for biologic findings. For example, criteria for rheumatoid arthritis published before 1995 do not include anti–citrullinated protein antibodies (5), which may necessitate updates of these criteria. There is a similar problem for the use of autoantibodies in vasculitis for example. This type of shortcoming is a date-dependent element, therefore the current conceptual framework of a disease should be stated during development of criteria.
How have the items included in the composite criterion been selected?.
It is important to determine how the potential candidate criteria to include in the set of criteria have been selected. For example, items included in a set of criteria for ankylosing spondylitis might be low back pain, peripheral synovitis, HLA–B27, and uveitis. At this stage in the creation of a set of criteria, the aim is to obtain a comprehensive list of potential criteria. This is an important step because it ensures that the items, and subsequently the resulting set of criteria, have potential face and content validity (i.e., reflect the domains of the disease). Thus, not only should the items reflect the disease, but furthermore the full list of potential criteria should reflect the full conceptual framework of the target disease. If at this stage a domain is left out, then the final set of criteria will not reflect the whole disease. For example, eye involvement might be left out of the ankylosing spondylitis criteria because uveitis was not included at this stage, and then would not be considered as a potential item for the set of criteria. At a minimum, a comprehensive literature search should be performed to create a list of the potential items for consideration. Usually it is best to combine this literature search with expert opinion. Patient opinions may also be important here if the aim of the criteria is to reflect patient assessment of their disease.
Has each potential candidate criterion been assessed in terms of psychometric property?.
The Outcome Measures in Rheumatology Clinical Trials (OMERACT) filter (7) can be used to assess the psychometric property of each item in the final criteria set. The OMERACT group defined several properties that a criterion must have to be used for assessment. The OMERACT filter specifies that a criterion must be truthful, i.e., it must reflect what it is supposed to reflect. Does uveitis reflect eye involvement in ankylosing spondylitis? This defines face, content, and construct validity. A criterion must be reliable, i.e., it must be reproducible. If a patient is asked twice if he has ever had uveitis, he should give the same answer both times. If synovitis is assessed by 2 different rheumatologists, will they find similar results? A criterion must be discriminant, i.e., it must distinguish between 2 situations of interest. Is uveitis more frequent in ankylosing spondylitis than in rheumatoid arthritis (or in the general population)? A criterion must be feasible, i.e., it must not be unacceptable for patients, difficult to obtain, or too costly to assess. At this stage, each item being considered should be assessed separately.
Often, assessments of psychometric properties have been performed previously by other researchers, and the data can be obtained through a literature review. It is important to obtain such data because in composite criteria, the weakest item can be a limiting factor.
Is there a patient-based study from which to derive the composite criteria?.
If a patient-based study exists, is the design of the study aimed at elaborating the set of criteria? In some cases, a set of criteria can be derived entirely based on published literature and/or expert opinion. In other cases, criteria sets are derived using patient data, which means that the candidate criteria are evaluated in different combinations in a set of patients to find the best combination of criteria. According to the literature review done by Johnson et al for their current article (1), only half of the criteria sets currently used in rheumatology were derived using patient data.
The design of a study depends on the study's content and the objective of the criteria. Possible study designs include cross-sectional studies (where the candidate set of criteria is evaluated at the same time as the gold standard), retrospective studies (where the candidate set of criteria is evaluated retrospectively), and longitudinal studies (where the gold standard will be obtained after a certain duration of followup). The longitudinal design is more specifically adapted to diagnostic criteria, such as in the case of criteria developed for early arthritis, where the gold standard would be fulfilling the ACR criteria for rheumatoid arthritis (5) after 2 years.
What is the origin of the patients?.
If patient data are used, it is important to detail the origin of the patients. Patients can be recruited from academic rheumatology practice, community rheumatology practice, or nonrheumatology practice. An appropriate number of patients, determined depending on the frequency of the disease, should be included. In the literature review by Johnson et al, it is noted that several criteria sets were developed based on small groups of patients (<50); this does not appear to be logical for diseases that occur frequently, such as osteoarthritis and rheumatoid arthritis.
What is the origin of the control subjects?.
The choice of control subjects depends on the objective of the criteria. If criteria are to be used in epidemiologic studies, then cases and controls should ideally be selected from the community. Healthy control subjects should be distinguished from control subjects with a different rheumatic disease in a rheumatology setting (e.g., when creating criteria for ankylosing spondylitis, controls might be patients with rheumatoid arthritis), and from control subjects with nonrheumatic disease or in a nonrheumatology setting (e.g., rheumatoid arthritis control subjects in internal medicine).
Who are the clinicians providing patients?.
Ideally, to reduce bias, the clinicians who provide the patients and control subjects for a study should not be the ones proposing the criteria. However, in practice, it is often more feasible for the same clinicians to do both.
What is the gold standard?.
To evaluate a candidate set of criteria, it is necessary to define a gold standard against which to evaluate the set of criteria. The gold standard can be an existing and previously validated set of criteria. For example, Rudwaleit et al (8) used the modified New York criteria for ankylosing spondylitis (9) as the gold standard for the elaboration of criteria for ankylosing spondylitis. Expert opinion (i.e., asking experts whether a patient has the disease in question) can also be used as the gold standard. For example, Dougados et al used expert opinion as the gold standard in the elaboration of the ESSG preliminary criteria for the classification of spondylarthropathy (6).
Which statistical technique is used to select the items to be included in the criteria set?.
There are many statistical techniques for item selection, including sensitivity and specificity, receiver operating characteristic curves, regression models (Table 1), recursive partitioning, frequency, chi-square, and t-tests, latent class analysis, factor analysis, Wilcoxon's rank sum, and cluster analysis. These techniques are complex and are the domain of the statistician. However, it is important to understand that these techniques all have the same objective: to find the optimal combination of items and relative weight of each, i.e., the combination which will have the best equilibrium between sensitivity and specificity. The ideal combination depends on the objectives of the criteria set. Criteria used for diagnosis should be very sensitive, whereas criteria used in epidemiologic studies require high specificity (1).
Whatever the technique chosen, usually several combinations of items and relative weights of items are tested by the statistician at this phase. It is interesting to publish the characteristics of the set of criteria according to the different combinations of criteria tested.
What type of presentation is chosen for the set of criteria?.
Criteria sets can be presented as a tree with one or several mandatory items. This was the type of presentation used for the ESSG criteria for spondylarthropathy (6), where, to fulfill the criteria, a patient must present either with back pain or with asymmetric synovitis. Criteria sets can also be presented as a list where all items are equivalent, and a patient must have a certain number of items to fulfill the criteria, which is the type of presentation used in the ACR criteria set for rheumatoid arthritis (5). Finally, criteria sets can be presented as a weighted list where some items carry more points than others, and a patient must have a certain number of points to fulfill the criteria. This is the type used in the spondylarthropathy criteria by Amor et al (10).
Has the proposed criteria set been assessed regarding psychometric properties?.
A proposed set of criteria should always be assessed for psychometric properties, as defined above. However, whereas above we were discussing the psychometric properties of individual criteria, here we are concerned with the assessment of the final set of criteria as a whole. This is an integral part of the elaboration process of a criteria set.
External validation: is there an external validation of the proposed set of criteria?.
After a set of criteria has been elaborated, it is useful to validate it in a new group of patients that is different from the patients who were studied during the initial elaboration process. This is called external validation. In an external validation, several elements have to be defined (as they were for the elaboration study), including the source of patients and controls, the nature of the clinicians involved, and the definition of the gold standard. Ideally, the clinicians who conduct the validation should not be the ones who are proposing the set of criteria. At this phase, it may be useful to compare existing sets of criteria with the proposed set of criteria.
The article by Johnson et al is an important milestone for methodologic aspects of classification criteria. Through their systematic review, the authors have assessed the current status of existing composite criteria, which will serve as a working agenda for further research in this field. It can also, as we have shown here, be used as a basis for preliminary recommendations for clinicians interested in elaborating new sets of criteria.