Corresponding Author C. L. Thwaites, Oxford University Clinical Research Unit, Hospital for Tropical Diseases 190 Ben Ham Tu, District 5 Ho Chi Minh City, Vietnam. Tel.: +84-8-923-7954; Fax: +84-8-923-8904; E-mail: firstname.lastname@example.org
Objectives To create a new tetanus score and compare it with the Phillips and Dakar scores.
Methods We used prospectively acquired data from consecutive patients admitted to the Hospital for Tropical Diseases, Ho Chi Minh City, to create the Tetanus Severity Score (TSS) with multivariate logistic regression. We compared the new score with Phillips and Dakar scores by means of resubstituted and prospective data, assessing performance in terms of sensitivity, specificity and area under receiver operator characteristic curves.
Results Resubstitution testing yielded a sensitivity of 77% (298/385) and a specificity of 82% (1183/1437) for the TSS; 89% (342/385) and 20% (281/1437) for the Phillips score; and 13% (49/385) and 98% (1415/1437) for the Dakar score. The TSS showed greatest discrimination with 0.89 area under the receiver operator characteristic curve (95% CI 0.88–0.90); this was 0.74 for the Dakar score and (95% CI 0.71–0.77) and 0.66 for the Phillips score (95% CI 0.63–0.70; P values <0.001). Prospective testing showed 65% (13/20) sensitivity and 91% (210/230) specificity for the TSS; 80% (16/20) and 51% (118/230) for the Phillips score; and 25% (5/20) and 96% (221/230) for the Dakar score. The TSS achieved the greatest area under TSS of 0.89 (95% CI 0.82–0.96), significantly greater than the Phillips score [0.74 (0.6–0.88), P = 0.049] but not the Dakar score [0.80, (0.71–0.90), P = 0.090].
Conclusions The TSS is the first prospectively developed classification scheme for tetanus and should be adopted to aid clinical triage and management and as a basis for clinical research.
Objectifs Créer un nouveau score de tétanos et le comparer à ceux de Philips et de Dakar.
Méthodes Nous avons utilisé des données obtenues prospectivement de patients consécutifs admis à l'Hôpital des Maladies Tropicales de Ho Chi Minh, pour créer le Score de Sévérité de Tétanos avec une régression logistique multivariée. Nous avons comparé le nouveau score à ceux de Phillips et de Dakar au moyen de données re-substituées et prospectives, en évaluant la performance en terme de sensibilité, spécificité et aires sous la courbe.
Résultats Le test de re-substitution a démontré une sensibilité de 77% (289/385) et une spécificité de 82% (1183/1437) pour le Score de Sévérité du Tétanos; 89% (342/385) et 20% (281/1437) pour le score de Phillips; 13% (49/385) et 98% (1415/1437) pour le score de Dakar. Le Score de Sévérité du Tétanos a démontré la plus haute discrimination avec une aire sous la courbe de 0.89 (IC95%: 0.88–0.90). Celle-ci était de 0.74 pour le score de Dakar (IC95%; 0.71–0.77) et 0.66 pour le score de Phillips (IC95%; 0.63–0.70) (P < 0.001). Le résultat du test prospectif a démontré une sensibilité de 65% (13/20) et une spécificité de 91% (210/230) pour le Score de Sévérité du Tétanos; 80% (16/20) et 51% (118/230) pour le score de Phillips; 25% (5/20) et 96% (221/230) pour le score de Dakar. Le Score de Sévérité du Tétanos atteignait la plus grande valeur de l'aire sous la courbe [0.89 (IC95%; 0.82–0.96)], significativement plus élevée que celle du score de Phillips [0.74 (0.6–0.88); P = 0.049] mais moins que celle de Dakar [0.80; (0.71–0.90); P = 0.090].
Conclusions Le Score de Sévérité du Tétanos est le premier schéma prospectif développé pour la classification du tétanos et devrait être adopté pour aider au triage et à la prise en charge clinique et aussi comme base de recherche clinique.
Objetivos Crear una nueva escala para el tétano y compararla con las de Phillips y Dakar.
Métodos Se utilizaron datos prospectivos de pacientes consecutivos admitidos en el Hospital de Enfermedades Tropicales de la ciudad de Ho Chi Minh, con el fin de crear la Escala de Severidad del Tétano utilizando regresión logística multivariada. Comparamos la nueva escala con las de Phillips y Dakar mediante datos resustituidos y prospectivos, evaluando el desempeño en términos de sensibilidad, especificidad y área bajo la curva característica operador-receptor.
Resultados La prueba de resustitución tuvo una sensibilidad del 77% (298/385) y una especificidad del 82% (1183/1437) para la Escala de Severidad del Tétano; 89% (342/385) y 20% (281/1437) para la escala de Phillips; y 13% (49/385) y 98% (1415/1437) para la escala de Dakar. La Escala de Severidad del Tétano mostró una mayor discriminación con un área baja la curva característica operador-receptor de 0.89 (95% IC 0.88–0.90); para la escala de Dakar fue de 0.74 (95% IC 0.71–0.77) y de 0.66 para la escala de Phillips (95% IC 0.63–0.70) (P < 0.001).Las pruebas prospectivas mostraron un 65% (13/20) de sensibilidad y un 91% (210/230) de especificidad para la Escala de Severidad del Tétano; 80% (16/20) y 51% (118/230) para la escala de Phillips; y 25% (5/20) y 96% (221/230) para la de Dakar. La Escala de Severidad del Tétano alcanzó la mayor área bajo una escala de severidad del tétano de 0.89 (95% IC 0.82–0.96), significativamente mayor que para la escala de Phillips (0.74 (0.6–0.88), P = 0.049) pero no la de Dakar (0.80, (0.71–0.90), P = 0.090).
Conclusiones La Escala de Severidad del Tétano es el primer sistema de clasificación desarrollado prospectivamente para el tétano y debería adoptarse con el fin de ayudar en la clasificación y el manejo clínico, así como una base para la investigación clínica.
Early recognition of severe tetanus allows prompt institution of intensive care management and may improve the outcome (Edmondson & Flowers 1979; Trujillo et al. 1980, 1987). In the settings where most tetanus occurs (WHO 2001) early identification of critically ill patients and effective patient triage can aid the optimal use of scarce resources. A statistical model that accurately predicts outcome, but does not require expensive or invasive investigations, would be valuable throughout the world to assist clinical management.
An accurate prognostic model would also enhance research. In clinical trials, prognostic scores can be used to set appropriate entry criteria in order to select the group most likely to benefit from an intervention, increasing the chance of a significant result and reducing potential harm to those unlikely to benefit. In studies recruiting a wide-range of patients, patients can be stratified to balance treatment groups (Knaus 1996). By comparing predicted and actual case fatality, the effect of therapy can be determined irrespective of variations in case-mix.
Disease severity scoring systems predict outcome for groups of patients. Scores can be specific for a particular disease or widely applicable to a variety of patients. The natural history of tetanus follows a typical course, beginning with symptoms of lockjaw and muscle stiffness, progressing to muscle spasms and, in severe cases, cardiovascular instability associated with autonomic dysfunction (Udwadia 1994; Cook et al. 2001). At presentation, most patients are relatively well, but, in the days after admission, a significant percentage deteriorate. An ideal score will differentiate those at risk of poor outcome soon after admission, when biochemical parameters, upon which general scores such as Acute Physiology and Chronic Health Evaluation (APACHE) are largely based, are unlikely to be abnormal, thus different variables, need to be examined.
We selected two commonly used scores (Phillips and Dakar), published >40 years ago without validation data (Phillips 1967; Vakil 1975) and compared them with a new score created using logistic regression.
The study was approved by the Scientific and Ethical Committee of the Hospital for Tropical Diseases (HTD), Ho Chi Minh City, Vietnam.
Patients and data
The study was conducted in a dedicated Tetanus Unit at the Hospital for Tropical Diseases (HTD), which treats patients with tetanus from the local community and the whole of southern Vietnam (approximate population 35 million). Consecutive patients ≥1-year old with a clinical diagnosis of tetanus admitted between April 1993 and December 2003 were enrolled. Administrative changes meant no patients were enrolled between October 1996 and May 1997 and between January and March 2000.
All data were collected prospectively onto standardized forms by the attending physician responsible for the patient's care and then checked by the Unit Director. A total of 2433 patients were admitted between April 1993 and December 2002 (Thwaites et al. 2004). Missing data and exclusions during the logistic regression process meant the new score could be calculated on 1824 of these patients. Most exclusions occurred as a result of the logistic regression process. When a large number of variables are entered into a model, any patient with just one variable missing will be automatically excluded from further analyses. Two patients had missing Phillips or Dakar scores, leaving a total of 1822 patients’ data to compare the new score with Phillips and Dakar scores. Prospective evaluation of the new score was performed using data from patients admitted between January and December 2003. Of the 253 patients admitted three were excluded because of insufficient data, leaving records from 250 patients to prospectively test and compare the new score. Data included demographic details, features of history, signs, symptoms and laboratory investigations at presentation, Phillips and Dakar scores. Intercurrent illness was scored using modified ASA criteria (American Society of Anesthesiologists 1963): 1 – none: normal healthy patient; 2 – mild: mild systemic disease; 3 – moderate: severe systemic disease that limits activity, but is not incapacitating; 4 – severe: incapacitating systemic disease that is a constant threat to life; 5 – life threatening: moribund patient not expected to survive 24 h with or without operation.
The Phillips score consists of four variables: incubation period, site of infection, state of immune protection and complicating factors. The first two categories assign scores from 1 to 5 and the second two scores from 0 to 10. Thus a final score with a maximum value of 30 is given, with higher scores associated with worse outcome. The Dakar score consists of six variables: incubation period <7 days, period of onset <48 h, ‘high-risk’ entry site and presence of fever, spasms and tachycardia on admission. Variables are scored either 0 (absent) or 1 (present), giving a maximum score of six for disease with the worst prognosis. Outcome was classified as ‘survived (to discharge from hospital)’ or ‘died’.
All data were entered into a computer database (Microsoft Excel, Microsoft, Redmond, WA, USA). Analyses were performed using SPSS version 10.0 for Windows (Microsoft USA) and STATA version 8.0 (StataCorp, College Station, TX, USA).
A total of 32 clinical and laboratory features on admission to hospital of patients admitted to the study were available for analysis. Univariate analysis was performed to examine variables associated with death, using Mann–Whitney U-tests for continuous variables and χ2 tests for categorical variables. Multivariate logistic regression was used to model the probability of death, with the results of the univariate analysis guiding data selection. Variables with small numbers of observations were excluded to maximize the sample size. Stepwise forward-entry logistic regression was used to select variables associated with poor outcome, with p-to-enter of ≤0.05 and p-to-reject of ≥0.051. Continuous variables were transformed into categorical variables and entered into the model. Entry sites were dichotomized into high-risk (internal or injection) and low-risk (all other) sites according to mortality rates. To create the Tetanus Severity Score (TSS), correlation co-efficients were trebled and rounded.
The performance of the new score was assessed using resubstitution and prospective data test methods. For all three scores, sensitivity and specificity were calculated and Receiver Operator Characteristic (ROC) curves plotted. 95% confidence intervals for sensitivity and specificity were calculated by using the Normal approximation to the Binomial distribution. A ROC curve is a plot of sensitivity against 1-specificity, i.e. true positive against false positive rate. The area under the curve is proportional to the degree of discrimination, with high values (>0.8) representing good discrimination and lower values (<0.5) corresponding to no discrimination. Agreement of predictions between scores was tested using paired binomial (McNemar) tests.
For these analyses, we selected cut-points of Dakar score >3 and Phillips score >14, a priori. The original descriptions of these scores do not give exact scores predictive of death. A Dakar score >3 was chosen, as this is the figure used in the first report of its use at the Fifth International Conference on Tetanus (Gallais et al. 1978). A cut-point >14 was chosen for the Phillips score, as in its original description, a score >14 was noted to be associated with ‘severe disease, with survival depending on the quality of treatment’ (Phillips 1967).
For patients admitted from 1993 to 2002, we compared characteristics on admission using univariate analysis (Tables 1 and 2). All continuous data, except lowest systolic blood pressure during the first day's hospital stay were significantly different between those who died and those who survived (P < 0.001 for all variables). All categorical data except open fracture, cranial nerve involvement and muscle stiffness on admission were also significantly different between the two groups (P-values <0.012).
Table 1. Results of univariate analysis of continuous data for symptoms/signs present on/before admission to hospital. Values are median (interquartile range)
Incubation period (days)
Period of onset (h)
Highest systolic BP during first day (mmHg)
Lowest systolic BP during first day (mmHg)
Highest HR during first day (bpm)
Lowest HR during first day (bpm)
Highest temperature during first day (°C)
Lowest temperature during first day (°C)
White blood cell count (103/mcl)
Serum creatinine (mg/dl)
Serum glucose (mg/dl)
Table 2. Results of univariate analysis of categorical data for symptoms present on/before admission to hospital
Proportion dying (N = 2421)
* Entry sites are mutually exclusive and refer to location of superficial wounds, except for ‘internal’ sites which include post operative/post partum or open fractures) and ‘injection’ which include intramuscular, subcutaneous or intravenous injections as entry points.
† HIV testing was not routinely performed.
Tetanus due to injection
Extent of tetanus
Cranial nerve involvement
Intercurrent illness (ASA grade)
5 Life threatening
Results of the final multivariate logistic regression using the complete dataset are shown in Table 3. The co-efficients produced from this regression, used to create the final score, shown in Table 4. A cut-point of eight was selected to divide ‘predicted survival’ from ‘predicted death’ (with ≥8 indicating predicted death), in order to optimize sensitivity and specificity.
Table 3. Logistic regression co-efficients obtained using 1992–2003 dataset
Table 4. New prognostic score: Tetanus Severity Score (TSS). The final score is calculated from the total of individual section scores
* Defined according to ASA physical status scale.
† ‘Internal’ site includes post operative/post partum or open fractures; ‘injection’ includea intramuscular, subcutaneous or intravenous injections.
Time from first symptom to admission (days)
Difficulty breathing on admission
Co-existing medical conditions*
Fit and well
Minor illness or injury
Moderately severe illness
Severe illness not immediately life threatening
Immediately life-threatening illness
Internal or injection
Other (including unknown)
Highest systolic blood pressure recorded during first day in hospital (mmHg)
Highest heart rate recorded during first day in hospital (bpm)
Lowest heart rate recorded during first day in hospital (bpm)
Highest temperature recorded during first day in hospital (°C)
Testing with resubstituted data, the new score, named TSS, had a sensitivity of 77% (298/385) and specificity of 82% (1183/1437) for a fatal outcome (Table 5). When tested on the same data, Dakar had a very low sensitivity of 13% (49/385), although high specificity of 98% (1415/1437). Phillips had a higher sensitivity of 89% (342/385), but specificity of only 20% (281/1437). ROC curves were constructed for all three scores and are shown in Figure 1. Discriminative power, as measured by area-under curve values, was greatest for the TSS, with an area of 0.89 (95% CI 0.88–0.90), compared to Dakar 0.74 (0.71–0.77) and Phillips 0.66 (0.63–0.70) (P-values <0.001).
Table 5. Sensitivity and specificity of Dakar, Phillips and Tetanus Severity scores [TSS (1993–2002 data)]
% (95% CI)
% (95% CI)
Cut points used predictive of death were ≥8 for TSS, ≥3 for Dakar and ≥14 for Phillips.
Prospective testing of the scores, using the most current data (from 2003), produced the following results (Table 6). The TSS had a sensitivity of 65% (13/20) and specificity of 91% (210/230). The Phillips score had sensitivity of 80% (16/20) and specificity of 51% (118/230) and the Dakar score had sensitivity of 25% (5/20) and specificity of 96% (221/230).
Table 6. Sensitivity and specificity of Dakar, Phillips and Tetanus Severity scores (TSS): 2003 data
Cut points used predictive of death were ≥8 for TSS, ≥3 for Dakar and ≥14 for Phillips.
ROC curves are shown in Figure 2. The TSS had the greatest area under curve of 0.89 (95% CI: 0.82–0.96). This area was not significantly different from that of Dakar area under curve 0.80, (0.71–0.90), P = 0.090], but was significantly greater than Phillips [area under curve 0.74 (0.6–0.88), P = 0.049].
Compared with the Dakar score, the TSS was significantly better at predicting deaths: predicting death in nine cases when Dakar failed, and failing in only one case when Dakar succeeded (P = 0.011). However, the TSS was worse at predicting survivors. It correctly predicted five survivors Dakar did not but missed 16 survivors Dakar correctly identified (P = 0.013).
Compared with Phillips, TSS was no worse at predicting deaths: TSS correctly predicted two deaths not predicted by Phillips, and Phillips predicted five missed by TSS (P = 0.227). However, the TSS was significantly better at predicting survivors, predicting survival in 96 cases when Phillips did not and missing only four survivors Phillips correctly identified (P < 0.001).
Mortality rates are shown in Table 7. As there were only 20 deaths in the 2003 data, the two sets were combined. Although the mortality of those with ‘high-risk’ Dakar score (>3) is highest, mortality of those categorized as ‘low-risk’ is also high. The TSS, however, separates the two groups and provides a clinically more useful prognostic indicator.
Table 7. Mortality rates for high- and low-risk groups using complete dataset (1993–2003)
Cut points used for high-risk were ≥8 for TSS, ≥3 for Dakar and ≥14 for Phillips. Low-risk was defined as values below these.
This study is the largest study validating prognostic scores in tetanus. There are few published data to support the use of Phillips or Dakar scores. In a series of 460 patients, Gallais et al. reported a case fatality of >75% associated with Dakar score of >3, but gave no further data to indicate the performance of the score (Gallais et al. 1978). We are unaware of data from any other centre validating the Phillips score. The case-fatality rate observed in this study was lower than reported by Gallais et al. Testing with the 1993–2002 data gave a case fatality rate for those with a Dakar score >3 of 69%. Both Phillips and Dakar scores discriminated poorly between survivors and non-survivors. Although sensitive (89%), Phillips was not specific (20%). Conversely, Dakar was highly specific (98%), but insensitive (13%). The TSS, however, had a sensitivity of 77% and specificity of 82%, and showed significantly better discrimination. Similar values were shown in prospective evaluation suggesting that the TSS is superior for use in both trials and clinical practice.
When the scores were tested against the prospective (2003) dataset, Phillips and Dakar scores showed some improvement. Sensitivity of the Dakar score increased to 25% and specificity of Phillips rose to 52%. However, given the small number of deaths and wide confidence intervals, these apparent changes in score performances should be interpreted with caution. Prospective testing showed a fall in sensitivity of the TSS to 65% (13/20), increase in specificity to 91% (210/230), but no change in area under the ROC curve. This may be due to changes in case-mix or changes in clinical practice. It is possible that during the study period changes in management resulting in improved outcome in those with high scores, thereby altering the sensitivity. Case fatality rates did indeed fall during the study (24% 1993 vs. 8% 2003; Thwaites et al. 2004) and may have contributed to this effect. The intention of this project was to validate a score with as recent data as possible in order to optimize its use for current and future clinical practice and research. Therefore, the most recent data were used for the final validation and despite the change in mortality rates, the TSS continued to perform well.
ROC curve analysis showed that TSS discriminated significantly better than Phillips, but although higher, the value was not significantly different from that of the Dakar score. The TSS was, however, more sensitive than Dakar and better at predicting non-survivors. In countries where most tetanus occurs, facilities are often limited, and this improved ability to identify high-risk patients is important as it allows appropriate targeting of resources to those most in need. Even in countries with good intensive care facilities, it is important to rapidly identify those likely to deteriorate. The TSS requires vital signs data from the first 24 h, but these variables are cheap and easy to record even in centres with limited resources.
There are several reasons why this new prognostic score may be superior to the old scoring methods. Firstly, it was constructed with more statistically robust methods. It is unclear how the Phillips and Dakar scores were constructed: the authors may have used univariate analysis, or simply used personal clinical experience. Although still subject to some limitations, such as assumptions about the linear relationship between variables (Ridley 2002), logistic regression is able to take account of interactions between variables, which is not possible using univariate analysis (Armitage et al. 2002). Many of the clinical features we studied, such as spasms and heart rate, are likely to be related and thus this method is especially useful. A large number of predictor variables were available for analysis in this study and may have resulted in a more accurate score. Results of univariate analyses showed that most of the variables were significantly associated with outcome, but multivariate analysis resulted in many being discarded from the model. For example, incubation period, a factor often cited as a good indicator of prognosis and included in both Phillips and Dakar scores (Patel et al. 1963; Udwadia 1994) was not included in the final model. However, time from first symptom to admission was selected, and this may represent a more reliable indicator of speed of disease progression, as it does not rely on subjective assessment of where and when the initial infection occurred.
Multivariate logistic regression has been employed before to produce a prognostic score for tetanus. Armitage and Clifford (1978) used the technique to construct a score that divided patients into three prognostic groups based on clinical features. However, although mortality rates varied markedly between the prognostic groups, the score discriminated between outcomes poorly. The initial sensitivity and specificity of the score, selecting the group with the worst outcome, were 24% and 94%, becoming 53% and 38% when tested prospectively.
Temporal, geographical and demographical differences affect the accuracy of scores. Changes in performance when used in different settings or different patient groups are well described (Rivera-Fernandez et al. 1998; Beck et al. 2003). Treatment of tetanus has changed significantly over the last 30 years. Intensive care facilities and nursing care have improved and interventions such as IPPV with non-depolarizing neuromuscular blocking agents have become common (Brauner et al. 2002; de Miranda-Filho et al. 2004) and therapies such as renal support are now employed in some centres (Asherton & Ruttmann 2002). All of these factors will affect patients’ outcomes and hence affect the performance of the model. The score now requires validation elsewhere to determine its performance with different groups of patients.
None of the scores tested in this study are able to predict an individual patient's outcome precisely enough to completely form the basis of treatment decisions (Ridley 2002). However, results of this study show the TSS is a valuable indicator of prognosis suitable for adoption as an international standard, in order to improve management and strengthen research worldwide.