Doctors and nurses have traditionally been taught that routine monitoring of vital signs is an important way of measuring physiological functioning and determining the probability of clinical deterioration and adverse events (Evans, Hodgkinson, & Berry, 2001; Kammersgaard et al., 2001). Instructions to monitor vital signs are widely found in textbooks, in clinical teaching, and on ward rounds during which patients’ vital sign charts are studied and discussed. Although routine monitoring is daily practice in hospitals, its diagnostic effectiveness has been a point of debate for many years.
Many older studies conclude that measuring vital signs is useful. These studies suggest that changes in vital signs occur hours prior to adverse events and clinical deterioration (Goldhill, White, & Sumner, 1999; Payman, Dampier, & Hawthorn, 1989; Schein, Hazday, Pena, Ruben, & Sprung, 1990). This is in direct contrast to more recent studies, which question the relevance of routine measurements. More recent studies have come to the conclusion that changes in vital signs either do not occur or do not occur early enough to determine the probability of adverse events in general hospital patients (Conen, Leimenstoll, Perruchoud, & Martina, 2006; Vermeulen, Storm-Versloot, Goossens, Speelman, & Legemate, 2005; Zeitz & McCutcheon, 2006).
The prevention and early detection of clinical deterioration and adverse events is currently a major topic in quality assurance programs. Worldwide, several governmental institutes have developed guidelines on the identification of acutely ill medical patients that recommend the use of early warning scores or related systems in which vital sign measurements are combined in an overall score (Institute of Healthcare Improvement, n.d.; Smith, 2011). Implementation of these guidelines has led to a substantial increase in the measuring of vital signs. Recent literature has mainly focused on the accuracy of these early warning models, which provide clinicians a tool for severity assessment (Gao et al., 2007; McGaughey et al., 2007). Knowledge of the positive likelihood ratio (LR+) for the different thresholds of each vital sign within these models is important in order to interpret them.
- Top of page
- Supporting Information
Measurement of vital signs is a widely used and accepted routine in hospitalized patients. However, our review revealed that this common practice is poorly studied. Only 14 observational and one diagnostic study were identified, from which only mediocre evidence could be drawn on clinical relevance underpinning this daily practice, especially since most studies were designed for purposes other than our primary study objective and were thus not free of potential bias. In general, all LR+ were low, but some interesting discriminative LR+ for single or combined vital signs were found.
This is illustrated by the study of Chalmers et al. (2008). They reported a promising discriminative LR+ of 4.07 for the single vital sign systolic blood pressure < 90 mmHg. The pre-test probability of 10% rose to a post-test probability of 31%, a change of 21%. However, even when vital signs deviate from the normal value and a relative discriminative LR+ is found, post-test probability remains rather low and the additional value to the clinician is doubtful.
Mato et al. (2009) reported on combined vital signs (temperature, heart rate, and respiratory rate). The AUC ranged from 66 to 76, which was slightly higher than their reports of single vital signs (AUC range 59–71). Lighthall et al. (2009) showed that when two or more abnormal vital signs were present simultaneously, a remarkably high LR+ of 47 was accompanied by a PPV of 78%. However, there is a high proportion of false negative rates; for example, of patients having an adverse event, only 28% had two or more abnormal vital signs.
As well as the poor post-test probabilities and false-negative rates, it is worth discussing the thresholds used and generalizability. The clinical relevance of discriminative LRs found in some of the studies included are questionable, since thresholds of vital signs used are extreme: systolic blood pressure < 90 mmHg (Chalmers et al., 2008; Gomez et al., 2006; Hoogewerf et al., 2006; Lighthall et al., 2009), oxygen saturation < 90% (Lighthall et al., 2009), and respiratory rate < 8 or ≥ 26/min (Lighthall et al., 2009). Patients with these extremes generally have easily identifiable clinical signs of deterioration for doctors and nurses with trained assessment skills or clinical judgment. Furthermore, in the study of Goldhill, McNarry, Mandersloot, and McGinley (2005), which was conducted in patients seen by an intensive care outreach service, similar discriminative LR+ were found when using the same extreme thresholds of vital signs. They also reported on less extreme thresholds, showing that differences between pre-test and post-test probability vanished.
For daily practice it is important to differentiate between thresholds. In our review, only three studies provided results for different thresholds (Mato et al., 2009; Smith et al., 2012; Vermeulen et al., 2005). Mato et al. (2009) used thresholds for heart rate > 90/min or ≥ 99/min, with no significant differences in the AUC. Vermeulen et al. (2005) conducted a diagnostic study and reported on different thresholds of body temperature measurement (BTM) in relation to infection. Results show that BTM is of limited value in the early detection or exclusion of an infection, and the false-negative rate was rather high. Smith et al. (2012) reported each SpO2 value in relation to mortality and showed results identical to those in the study of Goldhill et al. (2005), in which differences between pre-test and post-test probability vanished. This large study, comprising 37,593 medical patients, can be used to interpret different thresholds of oxygen saturation. For example, when using the same threshold as Kline et al. (2006); oxygen saturation < 95), the LR+ was almost equal in both populations. This suggests LR+ is independent of the pre-test probability for this threshold. Furthermore, Smith et al. (2012) showed an NPV of 95% for all thresholds (whether the SpO2 is 100% or lower than 88%), suggesting that in 5% of the SpO2 measurements patients cannot be ruled out to be not at risk. Therefore, one cannot identify patients at risk with routine SpO2 measurements, and the NPV is not informative.
The clinical relevance or generalizability of some studies can be questioned, since specific groups of patients (e.g., community-acquired pneumonia [Chalmers et al., 2008; Hoogewerf et al., 2006], and neutropenia with fever [Gomez et al., 2006]) with high pre-test probability of mortality and ICU admission were studied. Although Chalmers et al. (2008) found some moderate differences from pre-test to post-test, two other studies found none (Gomez et al., 2006; Hoogewerf et al., 2006). The same contradiction can be seen in excluded studies for this review: Goldhill et al. (2005) showed that in patients seen by an intensive care outreach service; an increasing number of deviating vital signs was associated with higher hospital mortality. In contrast, Pedersen, Moller, and Hovhannisyan (2009) found that early detection of hypoxemia in perioperative patients did not reduce either transfer to ICU or mortality. Thus, results are contradictory and can be due to differences in pre-test probability.
Our review demonstrates that there is still a lack of well-designed diagnostic and large observational studies specifically intended to investigate the clinical relevance of routine measurements for patients admitted to general hospital wards. Also, the definitions of “routine measurements” or “nonroutine measurements” are open for debate. This is in line with findings from other literature reviews of vital sign measurements, which conclude that there is a lack of explicit knowledge based on quantitative research (Evans et al., 2001; Lockwood, Conroy-Hiller, & Page, 2004; Pedersen et al., 2009). This suggests that much of the current practice of routinely measuring vital signs in general hospitalized patients (as well as the accuracy, frequency, and usefulness for detecting clinically relevant outcomes) is based on tradition and not yet on evidence from research.
Despite the lack of evidence, the monitoring of vital signs, or models mainly based on vital signs, currently receives a great deal of attention as part of quality and safety programs such as the Survival Sepsis Campaign and campaigns to detect critically ill patients (VMS Veiligheids programma, 2008, 2009). Although observational studies show a relationship between outcomes and the number of patients with deviated vital signs, diagnostic studies can reveal the predictive value of vital signs. Analyzing prospectively sampled large datasets of routinely collected vital signs, as Smith et al. (2012) did, has the potential to determine accuracy regarding prediction of adverse events in hospitalized patients.