A Costa, Departamento de Ginecologia e Obstetrícia, Faculdade de Medicina da Universidade do Porto, Alameda Hernani Monteiro, 4200-319 Porto, Portugal. Email firstname.lastname@example.org
Please cite this paper as: Costa A, Santos C, DAyres-, de-Campos, Costa C, Bernardes J. Access to computerised analysis of intrapartum cardiotocographs improves clinicians’ prediction of newborn umbilical artery blood pH. BJOG 2010;117:1288–1293.
Objective To evaluate the impact of access to computerised cardiotocograph (CTG) analysis on reproducibility and accuracy of clinicians’ predictions of umbilical artery blood pH (UAB pH) and 5-minute Apgar score.
Design Prospective evaluation of pre-recorded cases.
Setting A tertiary-care university hospital.
Population From databases of intrapartum CTGs acquired in singleton term pregnancies, 204 tracings with low signal loss and short time interval to delivery were consecutively selected.
Methods Tracings were randomly assigned to computer analysis by the Omniview-SisPorto 3.5® system (study group n = 104) or to no analysis (control group n = 100). Three experienced clinicians evaluated all tracing printouts independently and were asked to predict the newborns’ UAB pH and 5-minute Apgar scores from them.
Main outcome measures Interobserver agreement (measured by the intraclass correlation coefficient [ICC]) and accuracy in prediction of neonatal outcomes with 95% CI.
Results Agreement on prediction of UAB pH was significantly higher in the study group (ICC = 0.70; 95% CI 0.61–0.77) than in the control group (ICC = 0.43; 95% CI 0.21–0.60), and a trend towards better agreement was also seen in estimation of 5-minute Apgar scores (ICC = 0.55; 95% CI 0.38–0.68 versus ICC = 0.43; 95% CI 0.25–0.57). Observers predicted UAB pH values correctly within a 0.10 margin in 70% of cases in the study group (95% CI 0.61–0.79) versus 46% in the control group (95% CI 0.35–0.56). They predicted 5-minute Apgar scores within a margin of one in 81% of cases in the study group (95% CI 0.73–0.88) and in 70% of cases in the control group (95% CI 0.61–0.79).
Conclusions Prediction of UAB pH is more reproducible and accurate when clinicians have access to computerised analysis of CTGs.
Interobserver and intraobserver variability remains one of the main weaknesses of intrapartum cardiotocographic (CTG) monitoring,1–3 and computerised analysis of CTGs has been proposed as an alternative to overcome this limitation. The Omniview-SisPorto® 3.5 system (Speculum, Lisbon, Portugal) is a program for computer analysis of CTG and ST event signals, developed over the last 21 years at the University of Porto.4,5 Computerised analysis follows the classical steps of visual CTG analysis: baseline rate estimation, identification of accelerations, identification of decelerations, identification of contractions, and quantification of short-term and long-term variability. The system is extensively described elsewhere.5
Pathological alerts elicited by the Omniview-SisPorto 3.5 system have been shown to be highly predictive of fetuses born with umbilical blood artery (UAB) acidaemia.6 Despite such promising results it is unlikely in the near future that computerised systems will replace healthcare professionals in intrapartum management decisions. Not only would a more extensive evaluation of their validity and effectiveness be required, but also several medico-legal issues need to be adequately addressed.
Management decisions taken by healthcare professionals, based on visual analysis of the intrapartum CTG, have been shown to be poorly reproducible,1,7 but no studies have looked at this issue when computerised CTG analysis is made available. There are also no data on the accuracy of newborn outcome prediction by healthcare professionals when they are given access to computerised intrapartum CTG.
The aim of this study was to evaluate whether access to computerised CTG analysis affects reproducibility and accuracy of clinicians’ predictions of newborn UAB pH and 5-minute Apgar scores. If access to computerised CTG analysis results in increased reproducibility and accuracy in prediction of newborn outcome, this would suggest that this methodology could lead to a higher effectiveness. If no such effect should be observed, it would suggest that improved effectiveness is unlikely, at least as long as management decisions are left in the hands of healthcare professionals.
Cases were selected from two pre-existing databases of intrapartum CTG tracings collected for research purposes in a tertiary-care university hospital.6,8 The databases were searched and cases were consecutively selected if they fulfilled the following criteria: singleton pregnancies, more than 36 completed gestational weeks, fetus in a cephalic presentation, absence of known fetal malformations, active phase of labour, generally accepted indication for internal fetal heart rate (FHR) monitoring (poor signal quality, heavy meconium staining, high-risk pregnancy, etc.), a minimum of 60 minutes of tracing duration, signal loss in the last hour <20%, no complications with the potential to influence fetal oxygenation occurring between tracing end and delivery (difficult vaginal or abdominal fetal extractions, cord prolapse, maternal hypotension, shoulder dystocia, etc.), and no anaesthetic complications taking place at the time of surgery. Cases were subsequently excluded if the time interval between tracing end and vaginal delivery exceeded 5 minutes or if the interval between tracing end and caesarean birth exceeded 20 minutes. In all cases the umbilical cord was double-clamped immediately after birth and blood was drawn from both artery and vein into previously heparinised syringes. After vestigial air was expelled, blood gas analysis was carried out within 30 minutes after birth. Cases were excluded from UAB pH analysis if paired samples were not obtained, if pH values between the two samples differed by less than 0.03 units or if values of partial pressure of CO2 between the two samples differed by less than 7.5 mmHg.9–11 Apgar scores were evaluated by the health professional responsible for immediate neonatal support, in the majority of cases this being the attending midwife.
Using computer-generated random numbers, CTG tracings were assigned to receive computer analysis by the Omniview-SisPorto 3.5 system (study group) or no analysis (control group). Computer evaluation of tracings was performed offline, but using a methodology that is similar to real-time analysis (i.e. processing starts after the first 10 minutes and is subsequently updated every minute, only taking into account signals that were acquired until that time point). Tracing printout in the study group had the baseline drawn on the FHR graph, and accelerations, decelerations, contractions and periods with abnormal long-term and short-term variability were highlighted (Figure 1). The last alert elicited by the system was also displayed underneath the tracing. Tracings in the control group only displayed the usual FHR and uterine contraction signals. All tracings were printed at a paper speed of 1 cm/minute and were presented independently to three obstetricians with more than 5 years experience in CTG interpretation. With the information that tracings had been acquired in term pregnancies and that time-intervals to delivery were those previously mentioned, they were asked to estimate the newborns’ UAB pH (to two decimal places) and 5-minute Apgar scores.
Interobserver agreement was assessed using the intraclass correlation coefficient (ICC) and using limits of agreement (LA), both with 95% CI.12 The accuracy of clinicians’ estimations was estimated, allowing a maximum 0.10 error for UAB pH and a one-point difference for 5-minute Apgar score. It was evaluated by the percentage of correct estimations and by the agreement between estimated and real values using the LA with three observations per estimation.13 Values of the ICC exceeding 0.75 were interpreted as corresponding to an acceptable agreement. Measures of agreement were calculated using Microsoft Excel® 2003 and spss for Windows® version 10.0.7. The t test was used for the comparison of the two arms regarding the following parameters: gestational age, birthweight, male births, cord artery pH, caesarean delivery and the Mann–Whitney U test was used for comparison of Apgar score and duration of the tracing.
The main obstetric characteristics of the study population are displayed in Table 1. Of the 204 tracings selected, 104 were randomised to receive computerised analysis (study group) and 100 to receive no analysis (control group). Five-minute Apgar scores were available in all cases, but valid UAB pH values were only present in 183 (96 in the study group and 87 in the control group). Clinicians were therefore asked to perform a total of 612 Apgar score estimations and 549 UAB pH estimations (288 in the study group and 261 in the control group).
Table 1. Randomisation table of the two study groups (visual versus access to computerised cardiotocograph analysis).
Visual (n = 100)
Computerised (n = 104)
Gestational age in weeks; mean (SD)
Birth weight in grams; mean (SD)
Male births n (%)
Duration (minutes) of the assessed trace median (minimum–maximum)
Cord artery pH mean (SD) (21 missing values)
5-min Apgar scores median (minimum–maximum)
Caesarean delivery n (%)
The mean value of real UAB pH was 7.23 with a standard deviation (SD) of 0.08. Mean values of UAB pH predicted by observers A, B and C were 7.22 (SD = 0.08), 7.20 (0.06) and 7.18 (0.06), respectively. Real 5-minute Apgar scores ranged from 6 to 10, whereas values predicted by observers A, B and C ranged from 8 to 10, 6 to 10 and 6 to 10, respectively. Of the three cases of metabolic acidosis (UAB pH <7.05 and base deficit concentration in the extracellular fluid >12 mmol/l) that were included in this study, none went on to develop hypoxic–ischaemic encephalopathy, and all were assigned to the study group (one was adequately predicted by all observers and the other two were predicted by one of the observers).
Table 2 displays the interobserver agreement obtained in prediction of UAB pH and 5-minute Apgar scores in tracings assigned to the study and to the control groups. Agreement in prediction of UAB pH, as calculated by the ICC, was significantly higher in the study group. A trend was also seen in this group towards increased interobserver agreement in prediction of 5-minute Apgar scores, but this did not reach statistical significance.
Table 2. Agreement between the three clinicians in prediction of UAB pH and Apgar scores in tracings with visual CTG analysis and with access to computerised CTG analysis (95% CI in brackets)
Limits of agreement
Intraclass correlation coefficient
Visual CTG Analysis
Computerised CTG Analysis
Visual CTG Analysis
Computerised CTG Analysis
Umbilical artery blood pH
A and B
B and C
A and C
0.41 (−0.01 to −0.67)
A, B and C
A and B
B and C
A and C
A, B and C
In the study group, observers predicted UAB pH values correctly, within a 0.10 margin, in 70% of cases (95% CI 0.61–0.79), whereas in the control group this occurred in 46% (95% CI 0.35–0.56). Figure 2 displays the individual differences between predicted and real pH values, together with the obtained LA values, in the study (−0.16; 0.11) and in the control group (−0.21; 0.14). For the 5-minute Apgar score, correct predictions, within a margin of one, were obtained in 81% of cases in the study group (95% CI 0.73–0.88), compared with 70% of cases in the control group (95% CI 0.61–0.79). Table 3 displays the agreement between observers and real UAB pH and also between all three observers and the real UAB pH, using the ICC as the statistical measure.
Table 3. Agreement between the three clinicians in prediction of UAB pH in tracings with visual CTG analysis and with access to computerised CTG analysis (95% CI in brackets).
Intraclass correlation coefficient (n = 104)
Computerised analysis group
Visual analysis group
A and real
0.54 (0.38; 0.67)
0.36 (0.16; 0.53)
B and real
0.54 (0.35; 0.68)
0.31 (0.10; 0.49)
C and real
0.33 (0.13; 0.50)
0.12 (−0.05; 0.30)
All observers and real
0.52 (0.34; 0.66)
0.29 (0.08; 0.47)
This study demonstrates that clinicians agree more with each other on prediction of UAB pH and are more accurate in this prediction when they have access to computerised analysis of CTG tracings. This suggests that clinicians are consciously or unconsciously influenced by the results of computerised CTG analysis, and perhaps it leads to a more homogeneous and more correct tracing interpretation. However, it is possible that the degree of influence will depend on their previous experience and on their personal confidence with the system.
Few studies have addressed the accuracy of healthcare professionals in predicting UAB pH and Apgar scores based on the intrapartum CTG. Chauhan et al.14 evaluated the accuracy of five clinicians in estimating UAB pH <7.00, base excess ≥12 mmol/l and Apgar scores ≤3 at 5 minutes by visual analysis of CTG tracings. One hundred intrapartum nonreassuring FHR tracings were reviewed, from 1 hour before the appearance of CTG abnormalities and, if applicable, the hour before delivery. Large discrepancies were found in prediction of neonatal outcome variables. Overall Spearman correlation coefficients for prediction of low Apgar score, low pH and abnormal base excess ranged from 0.11 to 0.19, demonstrating no positive association between predicted and real outcomes. This could have been the result of the poor interobserver agreement found in visual classification of CTG tracings (weighted Kappa coefficients ranging from −0.12 to 0.15). This study concluded that visual analysis of intrapartum CTGs is not a useful diagnostic test for the identification of fetuses born with low Apgar score or abnormal acid–base state (likelihood ratio 1–2).
Nielsen et al.15 compared the prediction of fetal outcome obtained by a computer system with that of four experienced obstetricians performing visual analysis of CTGs. The final 30 minutes of 50 intrapartum tracings were evaluated. A dichotomised classification of fetal outcome as normal or compromised was used, the later defined as a 1-minute Apgar score <7, UAB pH <7.15, base excess ≤10 mmol/l or the need for primary resuscitation. The computer system obtained an accuracy of 86%, which was significantly higher than that of obstetricians (50–66%).
One may consider that a correct prediction of UAB pH values, within a 0.10 margin, in 70% of tracings is not impressive, particularly given the standard deviation in this value of 0.08. The distribution of UAB values in the general population may lead to similar results if an average value is always predicted. Nevertheless, the accuracy of prediction was significantly higher in the study group and the only possible explanation for this is the access to computerised CTG results. Prediction of Apgar scores from CTG tracings has a more limited value, as they are known to be affected by several other factors than oxygenation and they are subject to high interobserver variability. It was interesting to find nonsignificant trends towards a higher reproducibility and accuracy of 5-minute Apgar score prediction in the group that had access to computerised CTG. However it is not know whether a larger sample size would lead to a significant result.
When evaluating clinicians’ accuracy in the prediction of neonatal outcome parameters the decision was made to assess precision within a margin of error, rather than grouping cases into dichotomous classes and evaluating the sensibility and specificity. The wide variation in normal UAB pH values is well known, and attention has traditionally focused on cases with pH <7.05 or pH <7.00, particularly when associated with base deficit exceeding 12 mmol/l (metabolic acidosis). However, this approach requires a much larger sample size and/or an artificial selection of poor outcome cases. Evaluating the degree of fetal acidaemia, rather than identification of dichotomous classes is also a clinically useful objective, because this is frequently employed when deciding the timing of a clinical intervention, which should ideally be performed before the onset of metabolic acidosis.
It is well known that fetal oxygenation can deteriorate rapidly in unstable intrapartum situations. In this study great care was taken to reduce periods of signal loss and keep the CTG to delivery time interval to a minimum, so that the studied CTG pattern reflected as much as possible the fetal oxygenation at the time of delivery, hence the 5-minute interval to vaginal birth and 20-minute interval to caesarean section. In spite of this, it is acknowledged that particularly the latter period could have introduced some uncertainty into the results.
We have shown that access to computerised CTG analysis improves prediction of UAB pH. This has the potential to improve clinicians’ management decisions based on intrapartum CTG. Whether this will reduce the incidence of adverse neonatal outcomes needs to be adequately evaluated in a randomised controlled trial.
Disclosure of interest
The Institute of Biomedical Engineering receives royalties from the commercialisation of the Omniview-SisPorto program, which are not distributed among the inventors but are used solely for the promotion of further research.
Contribution to authorship
All authors fulfilled all conditions required for authorship and approved the submission.
Details of ethics approval
The cases included in this study were obtained from pre-existing databases acquired in the context of previously conducted research approved by the S. João Hospital ethics committee.11,13 All participants gave written informed consent for their clinical data and CTG files to be used for the purpose of research. No person-identifiable data were disclosed in the present study. The committee considered that their previous evaluations cover the specific purpose and methods of the present study.
This study was not financially supported. The authors’ institutions sponsored the authors’ time dedicated to the research.
1 Background: Appraise the evidence with regards to the relation between Apgar scores or umbilical artery blood (UAB) pH and cardiotocograph (CTG) traces. Discuss the evidence with regards to variability in interpretation of CTGs. Computerised analysis of CTGs is one of the possible ways to reduce this variability in interpretation and management of CTGs; are you aware of any other methods? Describe their individual merits.
2 Methods: Discuss the steps taken by the authors to reduce bias. Appraise the use of the intraclass correlation coefficient as outcome measure. Would you prefer to use the sensitivity and specificity for detection of cases with a UAB pH <7.1 as outcome measure? What would be the advantages of the latter approach, compared with the former, with reference to the cerebral palsy template? What would be the disadvantages, with reference to the study size?
3 Results and implications: Discuss the fact that three experienced clinicians were able to predict the UAB pH correctly with a 0.1 margin in less than 50% of cases when using a standard CTG alone, without the use of any decision support systems. As the standard deviation of 0.08 indicates that even a random guess would have a good chance of achieving similar accuracy, discuss the implications for relying on standard CTGs alone to assess fetal wellbeing and decide management. The results showed that prediction of UAB pH may be more reproducible and accurate when clinicians have access to computerised analysis of CTGs. What additional information would you hope to get from future studies, before considering implementation of computerised analysis?