We read with interest the paper by Westerhuis et al. reporting moderate inter- and intra-observer agreement on intrapartum cardiotocogram (CTG) interpretation using the four categories of the STAN guidelines.1 Agreement was high for classification of tracings as ‘normal’ and ‘preterminal’, but substantially worse for classification as ‘intermediary’ and ‘abnormal’. In a previous study, where we evaluated interobserver agreement on CTG classification and clinical decision by three clinicians using the FIGO guidelines, good agreement was also found for the classification of ‘normal’ tracings, but much lower levels were observed for the classification of ‘suspicious’ and ‘pathologic’.2

In contrast to the three-category FIGO guidelines,3 the four-category STAN guidelines use a more restrictive set of criteria to define the most severe class of CTG changes, the ‘preterminal’ classification. It also introduces the ‘intermediary’ class inspired by FIGO’s ‘suspicious’ classification, and the ‘abnormal’ class inspired by FIGO’s ‘pathologic’ classification, but from where the most ominous patterns were removed. It may be preferable to refer to the STAN guidelines as being inspired by the FIGO guidelines, rather than representing them.

The satisfactory reproducibility of the ‘normal’ classification, together with its well-demonstrated high negative predictive value is a reassuring finding regarding the method’s capacity to identify well-oxygenated fetuses. The high agreement reported for the ‘preterminal’ classification is also reassuring from the prospect of the need for intervention, although there is less hard evidence on its positive predictive value. The low reproducibility found in assignment of the ‘intermediary’ and ‘abnormal’ classes constitutes, in our opinion, a strong argument in favour of merging these two categories. This measure was undertaken for the introduction of the STAN methodology in North America, and is clearly supported by the results of this study. Keeping a technology as simple as possible and eliminating problems with reproducibility are important aspects for its wide dissemination and success, particularly when a large target audience, such as healthcare professionals managing intrapartum care, is involved. It is now widely accepted that reliance on visual analysis of the CTG is the weakest aspect of the STAN methodology.4

Another interesting finding in the study by Westerhuis et al.1 is the higher (but still moderate) agreement found on clinical decisions, although it cannot be ruled out that this was because of the dichotomous nature of the options (intervene–not intervene), which by itself creates less opportunities for disagreement. We also looked at agreement on clinical decisions in the previously mentioned study,2 allowing a three-category selection (no action, close monitoring and intervention) and, like Westerhuis et al., found a higher agreement for clinical decision than for CTG interpretation. If clinicians agree more on clinical decision based on CTG interpretation, than on the classification of the CTG itself, this suggests that the latter is probably more complex than actually needed, and that health professionals synthesise or simplify this information to reach their clinical decision. In our opinion, this constitutes an argument in favour of simplifying CTG classifications and making them more directly related to the decision process.


