Determining obstetric patient safety indicators: the differences in neonatal outcome measures between different-sized delivery units




To study the differences in neonatal outcome and treatment measures in Finnish obstetric units.


A registry study with Medical Birth Register data.

Setting and population

All births (n = 2 94 726) in Finland from 2006 to 2010 with a focus on term, singleton non-university deliveries.


All 34 delivery units were grouped into small (below 1000), mid-sized (1000–2999) and large (3000 or more) units, and the adverse outcome rates in neonates were compared using logistic regression.

Main outcome measures

Early neonatal deaths, stillbirths, Apgar scores, arterial cord pH, Erb's paralysis, respirator treatment, the proportion of post-term deliveries (gestational age beyond 42 weeks) and the proportion of newborns still hospitalised 7 days after delivery.


From an analysis of term, singleton non-university deliveries, the early neonatal mortality was significantly higher in the small relative to the mid-sized delivery units [odds ratio (OR), 2.07; 95% confidence interval (CI), 1.19–3.60]. The rate of Erb's paralysis was lowest in the large units (OR, 0.65; 95% CI, 0.50–0.84). The use of a respirator was more than two-fold more common in large relative to mid-sized units (OR, 2.38; 95% CI, 2.00–2.83). The proportion of post-term deliveries was highest in the large units (OR, 1.36; 95% CI, 1.31–1.42), where a significantly higher percentage of post-term newborns were still hospitalised after 7 days (OR, 1.50; 95% CI, 1.19–1.89).


There are significant differences in several neonatal indicators dependent on the hospital size. An international consensus is needed on which indicators should be used.


The identification and definition of reliable, nationally and internationally useful quality measures in obstetrics have remained great challenges.[1] This has not been caused solely by a lack of interest or investigation on the topic – there have been a few comprehensive, mostly American, papers addressing the issue,[1-3] and it has been shown that the implementation of an obstetric patient safety programme is likely to decrease the incidence of adverse outcomes.[2, 4] Despite this, an international consensus on recommended indicators is still lacking.

The first and most fortunate problem within obstetric patient safety measures is that adverse outcomes are rare. Second, process measures are not as clearly and easily reported as outcome measures, and they are less likely to engage public attention because of the inclusion of several intermediate steps rather than easy-to-understand outcomes. The traditional, widely recommended patient safety indicators in the field of obstetrics include three outcome measures: obstetric trauma with and without instrument and birth trauma – injury to the neonate.[3, 5, 6] In our previous paper,[7] we studied maternal outcome (obstetric trauma) as a patient safety indicator, whereas the aim of this study was to focus on perinatal measures.

In previous publications, a variety of neonatal outcomes, as well as process measures, have been studied with different conclusions with regard to their usability. In addition to birth trauma, the main indicators used have included stillbirths,[8, 9] early and late neonatal deaths,[9] low Apgar score,[8-10] low arterial cord pH[8] or standard base excess,[10] admittance to a neonatal intensive care unit (NICU)[8] and a set of different maternal and neonatal indicators merged into an adverse outcome index (AOI).[2] With the highly reliable Finnish Medical Birth Register (MBR),[11] many of these indicators are easily accessible and can be readily used to analyse patient safety nationwide or, for example, within healthcare districts or between different hospitals. This would be useful in tracking the trends in quality of care and would probably result in a decreased incidence of adverse outcomes.

The aim of this study was to report the current situation in Finland among the low-risk population by studying the differences between different-sized delivery units using an ample set of different neonatal outcome and process measures, and to analyse their utility as patient safety indicators within obstetrics and neonatology.


All hospital births (n = 2 94 726) between the years 2006 and 2010 in Finland were included in the study. However, we focused our analyses on a low-risk population, which we considered to comprise term, singleton deliveries with a gestational age (GA) of 37 weeks or more taking place in non-university clinics (n = 1 80 368). Because of the very well-implemented centralisation in Finland, e.g. 85% of the significantly preterm (GA < 32 weeks) newborns are born in university clinics, we could exclude most of the high-risk pregnancies and eliminate the confounding factor of the superior neonatological responsiveness in university clinics by leaving them out of our main analyses. In addition, we performed separate analyses for deliveries with a GA of 42 weeks or more (n = 15 020) and also analysed the proportion of these significantly post-term deliveries, because they are known to be associated with higher neonatal mortality and complication rates.[12, 13]

We analysed several different adverse neonatal outcome and process measures, and compared the outcome rates between different-sized delivery units. The studied indicators (Table 1) were chosen on the basis of previous publications[2, 8-10, 14] and on preliminary analyses of the availability of register data needed for the indicators and their feasibility in Finland. Admittance to an NICU was left out because, with this specific parameter, there are known differences in terminology and reporting between hospital districts, which would have confounded the results excessively. Instead, we included respirator treatment and the proportion of newborns still hospitalised 7 days after delivery as process measures because of their clear definitions.

Table 1. Studied indicators
Outcome measuresProcess measures
  1. a

    Perinatal mortality was defined as the number of stillbirths and deaths in the first week of life per 1000 births, early neonatal mortality as the number of deaths in the first week of life per 1000 live births and neonatal mortality as the number of deaths during the first 28 days of life per 1000 live births (World Health Organization, 2006). In Finland, the stillbirth rate is calculated from 22 weeks onwards.

  2. b

    Discharges with International Classification of Diseases, Tenth Revision (ICD-10) codes for Erb's paralysis (P14) and fracture of the clavicula (P13.4).

  3. c

    Discharges with ICD-10 codes for birth trauma (P10–15) and intraventricular nontraumatic haemorrhage (P52), including all deliveries with a newborn weighing more than 2000 g [modified from the Agency for Healthcare Research and Quality (AHRQ) definition].

Perinatal mortalityaRespirator treatment
Neonatal mortalityaThe number of newborns still hospitalised 7 days after delivery
Early neonatal mortalitya
Stillbirths (per 1000 live births)
Apgar below 4 and 7 at 5 minutes
pH below 6.95, 7.00, 7.05 and 7.10
Birthweight more than 4500 g
Erb's paralysisb
Fracture of claviculab
Birth traumac

The officially recommended patient safety indicator, birth trauma, consists of a set of different diagnose codes, and there are significant differences between the definitions for the indicator.[5, 15] In addition, all definitions are based on the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes[16] not widely used in Europe. Therefore, we decided to use the rate of Erb's paralysis and fracture of the clavicula as birth trauma measures. In addition, we modified a separate birth trauma indicator adapted from the Agency for Healthcare Research and Quality's latest definition[15] using International Classification of Diseases, Tenth Revision (ICD-10) codes.[17] According to the latest definition, we performed the analysis including all newborns with a birthweight of 2000 g or more.

All 34 delivery units were grouped into six categories according to the number of total annual deliveries (Table 2) and the rates for all measures listed in Table 1 were analysed for each size category. The comparisons were then performed between the small (<1000 annual deliveries), mid-sized (1000–2999) and large (3000 or more) units using a logistic regression analysis.

Table 2. Hospital groups
Hospital group by the number of annual deliveries (total)Number of units (number of university clinics)Instrumental delivery rate, % (university clinics excluded)Caesarean section rate, % (university clinics excluded)Number of singleton, term deliveries (GA 37 weeks or more, university clinics excluded)% of total deliveries (% total university clinics excluded)
  1. GA, gestational age.

<500 Small units5 (0)7.6 (7.6)20.5 (20.5)6232 (6232)2 (4)
500–99910 (0)7.5 (7.5)14.3 (14.3)33 634 (33 634)12 (19)
1000–1999 Mid-sized units8 (0)8.6 (8.6)15.6 (15.6)53 092 (53 092)19 (29)
2000–29994 (1)7.5 (7.8)14.6 (15.0)40 483 (29 254)15 (16)
3000–4999 Large units4 (2)8.0 (8.0)14.9 (15.8)68 949 (30 791)25 (17)
5000 or more3 (2)9.7 (10)18.0 (15.2)73 897 (27 365)27 (15)
Total34 (5)8.4 (8.4)15.9 (15.4)276 287 (180 368)100 (100)

The data for the study were obtained from the statutory Finnish MBR with excellent quality and high completeness.[11] MBR is maintained by the National Institute for Health and Welfare (THL), which gave the required authorisation for the use of sensitive health register data in scientific research, as required by national data protection legislation. Only anonymised data were used. No informed consent of the registered persons is needed for register-based studies in Finland.

The MBR includes information on maternal and neonatal birth characteristics and perinatal outcomes (all live born infants and stillborns with a GA beyond 22 weeks or weighing 500 g or more). The data are routinely collected from the clinical records of all the delivery units and revised from hospitals, if needed. The missing cases are searched for at the Hospital Discharge Register (hospital births), Central Population Register (live births) and Cause-of-Death Register (stillbirths and infant deaths). For most of the studied indicators, the reporting levels were nearly 100%. The reporting levels of cord arterial pH, however, varied greatly between the delivery units, with an average of 76%. The units reporting <10% (n = 6) were excluded from the analysis. After the exclusion, the average reporting rates were 67% in the small, 77% in the mid-sized and 92% in the large units.

Although the analyses were performed with university clinics included and excluded, the results are reported only for the latter. In addition, we report the results only for term deliveries unless stated otherwise. The complete data with all deliveries and university clinics included can be found in Table S1 in Supporting information.


The risk for early neonatal mortality was highest in the small units, but there were no statistically significant differences in stillbirths in the low-risk population (Table 3, Figure 1). However, when the preterm and multiple deliveries were included, there were fewer stillbirths in the large than in the mid-sized units [odds ratio (OR), 0.74; 95% confidence interval (CI), 0.60–0.91].

Table 3. Results for the essential indicators, term deliveries (GA of 37 weeks or more), university clinics excluded (n = 1 80 368)
   n Rate (%)OR95% CI
  1. CI, confidence interval; OR, odds ratio.

  2. a

    Statistically significant.

  3. b

    Discharges with International Classification of Diseases, Tenth Revision (ICD–10) codes for birth trauma (P10–15) and for intraventricular nontraumatic haemorrhage (P52), including all deliveries with a newborn weighing more than 2000 g [modified from the Agency for Healthcare Research and Quality (AHRQ) definition].

Early neonatal mortality (1/1000)

n = 67, per 1 80 176 live births

Small units39 8270.632.07a1.19–3.59
Mid-sized units82 2460.301 
Large units58 1030.290.960.52–1.78

Stillbirths (1/1000)

n = 192, per 1 80 176 live births

Small units39 8270.980.810.56–1.17
Mid-sized units82 2461.221 
Large units58 1030.910.750.54–1.05

Umbilical cord pH < 7.05

(hospitals reporting <10% are excluded, n = 1 54 018)

Small units27 8870.961.060.92–1.21
Mid-sized units82 3460.911 
Large units43 7851.131.24a1.11–1.39

Umbilical cord pH < 7.10

(hospitals reporting <10% are excluded, n = 1 54 018)

Small units27 8872.190.91a0.83–1.00
Mid-sized units82 3462.411 
Large units43 7853.291.35a1.27–1.45
Apgar below 7 at 5 minutesSmall units39 8661.591.070.97–1.18
Mid-sized units82 3461.481 
Large units58 1561.190.80a0.73–0.88
Apgar below 4 at 5 minutesSmall units39 8660.230.72a0.57–0.92
Mid-sized units82 3460.321 
Large units58 1560.300.950.78–1.14
Erb's paralysisSmall units39 8660.180.800.61–1.04
Mid-sized units82 3460.231 
Large units58 1560.150.65a0.51–0.84
Fracture of claviculaSmall units39 8661.271.020.92–1.13
Mid-sized units82 3461.241 
Large units58 1560.570.46a0.41–0.52

Birth traumab

n = 1 86 272

Small units39 8662.171.13a1.04–1.22
Mid-sized units82 3461.931 
Large units58 1561.460.76a0.70–0.82
Respirator treatmentSmall units39 8660.250.990.78–1.25
Mid-sized units82 3460.261 
Large units58 1560.552.13a1.79–2.54
Newborns still hospitalised 7 days after deliverySmall units39 8662.320.84a0.78–0.90
Mid-sized units82 3462.781 
Large units58 1562.961.06a1.00–1.13
Figure 1.

Early neonatal deaths and stillbirths per 1000 live births in term deliveries in each hospital group (university clinics excluded).

The results for the indicators of birth asphyxia, Apgar score and arterial cord pH, were inconsistent (Table 3). The large units reported higher rates of low arterial cord pH (<7.05 and 7.10) than the mid-sized units. The difference was more significant with the higher limit (pH 7.10), and statistically insignificant with the lower limits tested (pH 6.95 and pH 7.00). Controversially, the rate of low Apgar scores at 5 minutes was significantly lower in the large units when including scores <7, but not statistically significantly different from the mid-sized units when including scores <4 only. In the small units, the rate of Apgar scores <4 was lower than in the mid-sized units, but not significantly different with the higher Apgar score limit (<7). In addition, the small units reported a lower rate of arterial cord pH of <7.10.

Erb's paralysis and fracture of the clavicula were less likely to occur in the large units than in the mid-sized units. When using the modified indicator of birth trauma, the incidence was higher in the small and lower in the large units compared with the mid-sized units (Table 3). It is noteworthy that there were no significant differences in the caesarean section rates (Table 2) or in the proportion of deliveries of a newborn weighing 4500 g or more (data not shown) between the different hospital size categories. In addition, there were less vaginal instrumental deliveries in the small units (OR, 0.90; 95% CI, 0.86–0.94) and more in the large units (OR, 1.08; 95% CI, 1.04–1.11) than in the mid-sized ones (Table 2).

The use of a respirator varied significantly according to the hospital size, with use being higher the larger the unit. The use was more than two-fold higher in the large units compared with the mid-sized ones (OR, 2.05; 95% CI, 1.71–2.46) even after excluding the post-term (GA of 42 weeks or more) deliveries (Table 3).

The proportion of newborns still hospitalised 7 days after delivery was significantly lower in the small units and higher in the large units than in the mid-sized ones (Table 3). When the post-term (GA of 42 weeks or more) deliveries were excluded, this proportion was no longer statistically significantly higher in the large units.

When analysing post-term deliveries (GA of 42 weeks or more), only a few of the differences in the studied indicators were statistically significant because of the small numbers. It is noteworthy that the use of a respirator (OR, 2.58; 95% CI, 1.47–4.53) and the proportion of newborns still hospitalised 7 days after delivery were significantly higher in the large units (OR, 1.50; 95% CI, 1.19–1.89). The rates of arterial cord pH <7.10 (OR, 0.64; 95% CI, 0.46–0.90) were significantly lower in the small than in the mid-sized units, but the results for other indicators for asphyxia were not statistically significant. There were significantly lower rates of fracture of the clavicula in the large than in the mid-sized units (OR, 0.50; 95% CI, 0.33–0.75), but no statistically significant differences were seen in the rate of Erb's paralysis.

In addition, the proportion of post-term deliveries was significantly higher in the large than in the mid-sized units (OR, 1.36; 95% CI, 1.31–1.42). The difference appeared to be more significant the longer the gestational length (Figure 2), with a steady increase in the OR (at 42 + 0 weeks: OR, 1.14; 95% CI, 1.05–1.25; at 42 + 1 weeks: OR, 1.64; 95% CI, 1.52–1.77; at 42 + 2 weeks: OR, 1.74; 95% CI, 1.57–1.93). In the small units, the proportion was lower than in the mid-sized units at 42 + 1 (OR, 0.89; 95% CI, 0.80–0.98) and 42 + 2 (OR, 0.86; 95% CI, 0.75–0.99) gestational weeks.

Figure 2.

Deliveries according to gestational age in each hospital group (university clinics excluded), %.


Main findings

Our study shows significant differences in adverse outcome rates in low-risk non-university deliveries with several different neonatal outcome and process measures studied, suggesting differences in treatment culture dependent on the size of the delivery unit.

We found significantly higher rates of early neonatal mortality in the small relative to the larger units. The risk for Erb's paralysis and fracture of the clavicula was lowest in the largest units. There were less long hospitalisations of newborns in the small than in the larger units. Interestingly, there were significant differences in the proportion of late post-term pregnancies between the different-sized units. The large units were more likely to have prolonged pregnancies and showed a higher proportion than the mid-sized units of post-term newborns still hospitalised 7 days after delivery.

Strengths and limitations

The source of the data was the highly reliable MBR, which enabled us to use a nationally comprehensive population of nearly 3 00 000 births. We did not adjust the analyses by maternal risk factors. However, the high-risk pregnancies are strongly centralised into university clinics, which were excluded from the main analyses. The use of birth trauma as a patient safety indicator was complicated because of problems with the definition, which is still based on ICD-9-CM codes,[16] and the possible failures in reporting of the very rare complications which make up the indicator.


The trend in Finland has been towards fewer and larger delivery units (31 units by the end of 2011, with an average of 1910 annual deliveries) and, according to our results on early neonatal mortality, this is a sensible trend. Although the risk for stillbirth was similar in all hospital size categories when analysing the low-risk population, a lower risk was observed in the large units. Stillbirth is one of the few maternal and child health-related complications that has not declined in recent decades, and the emerging view is that most normally formed singleton stillbirths without congenital anomalies are potentially preventable.[18, 19] Added to this, our primary finding highlights that this matter needs to be studied further and suggests that multiple pregnancies should be centralised into large units.

Some of the process measures used indicated the centralisation of high-risk deliveries into larger, non-university units, significant differences in treatment culture dependent on hospital size, or both. According to our study, newborns were less likely to be treated for long periods in small units, which was expected as these units treat low-risk deliveries. In the large units, the proportion of newborns still hospitalised at the age of 7 days was slightly higher than and the use of a respirator was twice as frequent as in the mid-sized units, indicating higher morbidity of newborns in these units. Alternatively, this might also be a sign of overtreatment, which, in turn, may cause unnecessary harm to the patient.

We found longer treatment periods and greater use of a respirator among post-term newborns (GA of 42 weeks or more) in units with the highest proportion of post-term deliveries. This indicates that neonatal morbidity increases with prolonged GA. Recent studies have shown that post-term pregnancy increases significantly neonatal mortality, whereas induction of labour at term gestation (GA of 40 weeks) does not increase the risk for operative or instrumental vaginal delivery.[12, 20]

Previously, it has been shown that Erb's paralysis is not well predicted by the known risk factors, such as shoulder dystocia or macrosomia, and that a higher caesarean section rate and operative vaginal delivery are not associated with a lower incidence of trauma.[21-23] This is in accordance with our results. The lower rates of Erb's paralysis in the large units could be associated with the larger volume of deliveries, leading to more experience of the manoeuvres used in order to prevent trauma to the brachial plexus.

Although birth asphyxia markers of arterial cord pH and Apgar score are known to be correlated,[24] our findings with these indicators were controversial, highlighting the limitations of their use as patient safety indicators. An objective outcome measure, such as pH, is not as feasible as the Apgar score, which is given to every newborn. However, the widespread availability should not be a reason to rely on an indicator, and our results do not support the use of the Apgar score, although it is still widely regarded as one of the most important neonatal outcome indicators.[14, 25]


Neonatal outcome and process measures provide valuable information and should be implemented into clinical work. Early neonatal mortality has been proven to be a sensitive indicator for inter-hospital comparison and should be monitored together with stillbirth rates. Although the differences in some of the process measures, i.e. respirator use, elicited the question of possible overtreatment in the large units, most of the outcome measures showed better results in these units. The implications of higher morbidity of late post-term newborns, together with the great inter-hospital variation in the proportion of deliveries with a GA beyond 42 weeks, indicate that there might be a need for a uniform and earlier induction policy. In addition, birth trauma, the traditional indicator, needs to be developed further and updated according to the current ICD classification.

National indicator projects for obstetric quality measures should be established in order to provide data not only for national use, but also for international forums, which are still lacking a consensus on recommended indicators. This work would enable national and international benchmarking, but, more importantly, enhance the pursuit towards improved clinical practice with less adverse outcomes.

Disclosure of interests

No part of this material has been published elsewhere or has been submitted for publication elsewhere. None of the authors have any potential conflicts of interest to be disclosed.

Contribution to authorship

AP conceived the study in collaboration with MG, MJ, JP and A-MT. MG retrieved register data and assisted AP in statistical analyses. All authors contributed to data interpretation. AP wrote the first draft of the manuscript, and all authors contributed to the revision and accepted the final version.

Details of ethics approval

The analyses were performed after the register keeping organisation, THL, had given the authorisation required by the national data protection legislation.


No special funding.



Commentary on ‘How outcome studies can lead to improvements in obstetric quality’

The aim of measuring quality is to improve quality. Through measurement, we learn which hospitals have the best outcomes and can study the processes that help the hospital to achieve these good outcomes. In this issue of BJOG, PyyÖnen et al. present work examining differences in outcomes for low-risk term pregnancies in Finland. They show that the size of the hospital is associated with differences in neonatal outcomes. Interestingly, good outcomes are not concentrated in small or large hospitals. Instead, low-volume hospitals show better outcomes in some areas (pH < 7.10 and Apgar < 4 at 5 minutes), whereas large hospitals show less obstetric trauma, fractured clavicles and Erb's palsies.

The processes that lead to the various neonatal outcomes studied by Pyyönen et al. are not the same. For example, the rapid and safe relief of a shoulder dystocia without incurring a neonatal injury does not require the same skill set as judging when a fetus is intolerant of labour and requires expedited delivery. There is no reason to believe that, because an institution is good at one of these skills, it will be good at both. Furthermore, there may be systematic reasons why these skill sets and organisational resources might be better at one hospital than another. As the authors speculate, perhaps large hospitals have more opportunities to perform shoulder dystocia manoeuvres and thus, on the whole, are better at them. The link between procedure outcomes and hospital volumes has been shown in other areas of medicine (Birkmeyer et al. N Engl J Med 2002;346:1128–37). Alternatively, in smaller hospitals with few staff members, one could speculate that the need for more lead time to gather the multidisciplinary team necessary for a caesarean section may require more situational awareness and attention to the fetal monitor. Thus, the need for proactive lead time to gather a team may result in more attention paid to the subtle signs of trouble.

Wise use of the processes that lead to good outcomes in obstetrics is not the same as maximising the use of these routines. Obstetrics is unique in that two individuals are affected by the health care provided. Furthermore, the outcomes of one may be adversely affected by the care that may help the other. The classic example of this is caesarean section for fetal intolerance of labour. Although the fetus may have a better outcome, the mother is at increased risk for bleeding and infection as well as future complications (Kyser et al. Am J Obstet Gynecol 2012;207:e1–17.). What is unclear is whether the processes that lead to worse maternal outcomes lead to better neonatal outcomes, and vice versa.

Once the processes leading to a balance between the outcomes for mother and child are found, understanding the forces that make an institution able to consistently and reliably carry them out is key. For example, staffing patterns are a force that may influence quality (Landrigan et al. N Engl J Med 2004;351:1838–48). It is through understanding the forces that create consistent quality that we will ultimately improve the quality of care at all hospitals. Studies such as that by Pyyönen et al. provide us with the insights that will ultimately lead to an improvement of the quality of obstetric care.

Disclosure of interests

The authors declare they have no conflict of interests.

  • JL Bailit

  • Departments of Obstetrics and Gynecology and the Center for Health Care Research and Policy, Metrohealth Medical Center, Cleveland, OH 44109, USA