Use of a confidence scale in reporting postmortem fetal magnetic resonance imaging

Authors


Abstract

Objectives

Postmortem magnetic resonance imaging (MRI) may be an alternative to conventional autopsy. However, it is unclear how confident radiologists are in reporting such studies. We sought to determine the confidence with which radiologists report on various fetal organs by developing a scale to express their confidence of normality and abnormality, and to place this in the context of a pathological diagnosis of whether the organ was in fact normal or abnormal.

Methods

Thirty fetuses, aged 16–39 gestational weeks and weighing 61–3270 g, underwent postmortem MRI prior to conventional autopsy. MRI studies were reported by two radiologists with access to the clinical and sonographic history: a neuroradiologist, reporting head and neck, and a pediatric radiologist, reporting thorax, abdomen and pelvis. Radiologists used a scale (0 = definitely abnormal, 100 = definitely normal, 50 = unable to comment) to indicate their confidence of anatomical structures being normal or abnormal, using a checklist. Conventional autopsies were performed by pediatric pathologists blinded to the MRI findings, and these were considered the reference standard.

Results

Most normal fetal organs had high scores on postmortem MRI, with median confidence scores above 80. However, the atrioventricular valves, duodenum, bowel rotation and pancreas proved more difficult to assess, with median scores of 50, 60, 60 and 62.5, respectively. Abnormal cardiac atria and ventricles, kidneys, cerebral hemispheres and corpus callosum were always detected with high or moderate degrees of confidence (median scores of 2.5, 5, 0, 0 and 30 respectively). However, in two cases with abnormal cardiac outflow tracts, both cases scored 50. Kappa values, assessing agreement between MRI diagnoses of abnormality and autopsy, were high for the brain (0.83), moderate for the lungs (0.56) and fair for the heart (0.33).

Conclusions

This scoring system represents an attempt to define the confidence of radiologists to report varying degrees of normality and abnormality following z ex-utero fetal MRI. While most fetal anatomy is clearly visualized on postmortem MRI, radiologists may lack confidence reporting such studies and there are particular problems with assessment of some cardiac and gastrointestinal structures, both normal and abnormal. Copyright © 2006 ISUOG. Published by John Wiley & Sons, Ltd.

Introduction

Fetal and perinatal deaths due to structural or chromosomal abnormality, late miscarriage and stillbirth are common events in obstetric practice, complicating around 1% of pregnancies1. Specialist investigations, including autopsy by an experienced pediatric/fetal pathologist, are often recommended in these circumstances. Even when there is an identifiable clinical cause of death, autopsy findings may modify or correct it2, and inform of the risk of recurrence in cases of fetal abnormality3. Despite this, increasing numbers of parents decline consent for an autopsy1. Furthermore, access to the services of a specialist pediatric pathologist is limited in many localities. In view of the decline in postmortem consent rates, and recent publicity regarding postmortem organ retention, magnetic resonance imaging (MRI) has been proposed as an alternative to conventional perinatal autopsy, and researchers in Europe and the United States4–6 have examined their comparability. Despite increasing, but still limited, worldwide experience7–9, postmortem MRI remains largely an unproven and investigational technique and there remain a number of questions about its feasibility in a clinical context7, 10, 11. It has also been suggested that MRI may be superior in the in-situ assessment of the central nervous system (CNS)5, 10, which may otherwise prove difficult to examine by conventional means due to autolysis or limited parental consent9.

One major question relates to the confidence that radiologists have in reporting such studies, when they may be unfamiliar with both the normal and the abnormal appearance of the fetal anatomy at various gestational ages on postmortem fetal MRI. We therefore developed a ‘confidence score’ to allow radiologists to assess the likelihood of normality or abnormality of a particular organ at the time of reporting, and related these scores to whether the organs were found subsequently to be normal on pathological examination. We also related our system to other scoring systems developed for ultrasound or MRI assessment, in particular the recent studies of Kubik-Huch et al.12 and Levine et al.13.

Methods

We performed a prospective comparative study of postmortem MRI versus conventional autopsy over 2 years, from October 2003 to June 2005, with local research ethics committee approval (Cambridge REC reference 02/004). Our hospital has 5000 maternities per year, and a sub-regional fetal medicine unit, receiving referrals from maternity units across eastern England (delivery population, approximately 24 000/year). The study group consisted of 30 fetuses, comprising those that were either miscarried or stillborn (n = 15) or from pregnancies terminated because of fetal abnormalities (n = 15). Gestational age ranged from 16 to 39 weeks, and birth weight from 61 to 3270 g. Consent was sought for full conventional autopsy including examination of the brain (but not necessarily brain retention for specialist neuropathology), and only when parents gave their consent for such an examination was explicit written consent sought for additional postmortem imaging by MRI prior to the conventional autopsy. Any case in which the parents only gave consent for a limited autopsy, excluding, for example, examination of the brain, was not eligible for recruitment to the study. Following delivery, fetuses were stored in refrigerated compartments, within either the maternity hospital or the hospital mortuary, prior to imaging and autopsy. Figure 1 illustrates the flow of patients through the study.

Figure 1.

Patient flow diagram. MRI, magnetic resonance imaging; US, ultrasound.

Postmortem MRI was undertaken by a specialist radiographer, using a GE Signa 1.5T clinical MRI system (GE Healthcare, Milwaukee, WI, USA), with software release 11. Three different receiver coils (head, knee and wrist coils) were used, according to the size of the fetus. T2-weighted sequences were used, in keeping with previous studies on postmortem fetal MRI5, 6, 9, and because, in our experience, T1-weighted sequences add little. Typical scan parameters are given in Table 1.

Table 1. Typical magnetic resonance imaging (MRI) parameters for FSE-XL T2-weighted sequences in the three anatomical orthogonal planes*
ParameterBody imagingHead imaging
  • *

    Coronal and sagittal sequences include both fetal head and body; separate axial sequences were performed for fetal head and body. NEX, number of excitations; TE, echo time; TR, repetition time.

TE effective (ms)102102
TR (ms)4000–650015 000
Echo train length13–2432
Bandwidth (kHz)20.8320.83
Field of view (cm)16–2812–16
Slice thickness (mm)2–3, no gap2, no gap
Matrix512 × 256, 384 × 256 for axial sequences256 × 256
NEXMinimum of 45
Phase field of view0.5 used to reduce time for axial and sagittal sequences.0.75 for axial sequences
Scan time5–8 min each plane7 min 30 s

The whole body of the fetus was imaged, although coverage of the extremities was not performed routinely, as these may be better assessed by plain radiography and external examination14, 15. The typical total scan time was 30–60 min depending on fetal size. Postmortem studies were performed as soon as possible after delivery within the service constraints of the MRI unit and the availability of the specialist MR staff. The postmortem studies were reported by (1) a consultant neuroradiologist with an interest in pediatric MRI, reporting on the brain, head and neck, and (2) a consultant pediatric radiologist, reporting on the body. These radiologists had no significant prior experience of fetal MRI, but had considerable experience of neonatal and pediatric MRI. The reporting radiologists were aware of the clinical and ultrasound history, but were blind to the autopsy findings. They reported their findings using a checklist (Appendix S1, online), analogous to that used by perinatal pathologists, and scored each anatomical structure according to the probability of it being normal. When the radiologist considered an organ or structure to be abnormal they were asked to state the diagnosis.

We developed a simple scale for scoring organs on MRI: a score of 100 was given if the radiologist was certain that the structure was normal, and a score of 0 if they were certain of abnormality. Any structures for which the radiologist felt unable to comment on normality were given a score of 50. Intermediate degrees of confidence about normality and abnormality would score over 50 (probably normal) or below 50 (probably abnormal), respectively, with scores rounded to the nearest 5 or 10. MRI studies were reported at regular sessions, usually within 2 weeks of the study being performed.

Conventional autopsies were performed by one of two consultant perinatal pathologists working in our hospital, according to Royal College of Pathologists guidelines16. The pathologists were informed of the clinical and ultrasound findings but were blinded to the postmortem MRI findings. Specialist neuropathology was performed when judged appropriate by the referring pathologist, and when explicit parental consent was given.

Organs were classified into those that were found subsequently to be normal and those that were found to be abnormal at autopsy. We calculated true-positive and false-negative rates for lesions of organ systems (Table 2), sensitivity and specificity. We also determined weighted Kappa coefficients for those organ systems with significant numbers of anomalies detected at autopsy17. Given the small numbers of abnormal organs in the study population, and the highly selected nature of the population studied, we considered it inappropriate to report positive and negative predictive values. To allow comparisons to the results of Kubik-Huch et al.12, any score of 50 or greater was considered normal, and any below 50 as abnormal.

Table 2. Lesion detection and performance of postmortem magnetic resonance imaging (MRI) compared with autopsy
Body partLesions at autopsy (n)True positive (n)False negative (n)Abnormal organs at autopsy (n)Abnormal organs at MRI (n)Sensitivity (%)Specificity (%)Kappa
  1. NA, not applicable (too few cases).

Brain12 (8 fetuses)12/131/138787.595.50.83
Spinal canal00000NANANA
Heart15 (8 fetuses)7/158/1582251000.33
Lungs8 (8 fetuses)5/83/88362.587.00.56
Liver and spleen00001NANANA
Urinary tract2 (1 fetus)1121NANANA

Results

Postmortem MRI was performed within 2 days of delivery in 80% of cases. The median time to imaging was 1 (range, 0–4) day. The median interval from delivery to autopsy was 4 days, and 95% of autopsies were performed within 7 (range, 2–11) days from delivery. The corresponding intervals from MRI to autopsy were 1–8 days, with a median of 2.5 days.

The normality or abnormality of organs was not known at the time of MRI reporting. The discriminatory ability of MRI to distinguish between abnormal and normal organs is shown in Table 2, Figure 2 and Appendix S2 online.

Figure 2.

Confidence scores on magnetic resonance imaging for organs confirmed normal (a) and abnormal (b) at autopsy. Median and range are indicated.

While many anatomical structures could be evaluated with a high degree of confidence, others could not. Scores are shown separately for those organs found to be normal at autopsy (Figure 2a) and those found to be abnormal (Figure 2b). Those organs for which normality and abnormality could not be discriminated easily scored around 50.

The cerebral hemispheres, spine and the cardiac situs were among the anatomical features assessed with highest confidence, with median confidence scores of 90 or over for normal structures. Conversely, the atrioventricular valves, duodenum and bowel rotation proved among the most difficult features to assess, scoring between 50 and 60 on average. Distinguishing between normal and abnormal appeared easiest for the corpus callosum, cerebral hemispheres and kidneys (Figure 2). Abnormal cardiac chambers were usually distinguished from normal ones easily, but other cardiac structures, such as the outflow tracts and the ventricular septum, proved more difficult.

The atrioventricular valves and trachea proved difficult to assess in the evaluation of the neck and thorax, with median scores for normal structures of 50 and 80, respectively (but with a wide range). Rotation of the bowel and, among abdominal organs, pancreas and duodenum proved difficult to report with confidence, having median confidence scores of 60 for normal cases.

Table 2 shows the ability of our radiologists to detect normal and abnormal organs. These data confirm a high detection rate for CNS anomalies (92.3%) and a low false-negative rate—in one case there were abnormal CNS findings at autopsy (of mild ventriculomegaly) that had been reported incorrectly as normal on MRI. There were, however, a number of false-positive findings in the brain. All but one of these false-positive findings were in fetuses with other major CNS malformations.

Only seven of 15 cardiac lesions found in eight fetuses at autopsy were identified at postmortem MRI. There were, however, no false-positive cardiac diagnoses. Fetal lung hypoplasia was usually detected easily. However, abnormalities of lung lobation were never detected. There were two false-positive lung diagnoses: in one case the radiologist had a low degree of confidence of abnormal (reduced) lung volumes, but this was not confirmed at autopsy; in the other, abnormal lung signal was suspected, but the macroscopic appearance and histology of the lungs were normal.

Two false-positive diagnoses were made in the fetal liver and kidneys. In one 17-week fetus, the reporting radiologist considered the liver to be abnormally large, while it was reported to be autolyzed but otherwise normal at autopsy; in the other case the fetal kidneys appeared abnormally shaped on MRI, but were reported as normal at autopsy.

Kappa scores for agreement between postmortem MRI and autopsy findings were 0.83 for the brain, 0.33 for the heart and 0.56 for the lungs.

Discussion

The use of postmortem MRI to determine fetal abnormalities after stillbirth or termination of pregnancy is not new; however, there has been very little formal study of which fetal organs can be assessed with accuracy using MRI, and which are difficult or impossible to assess. Clinicians counseling patients about the relative merits of MRI or conventional autopsy should be aware of the diagnostic uncertainty about which organs can be assessed reliably on postmortem MRI. Radiologists considering providing an MRI autopsy service for bereaved parents should also be cognisant of these issues.

The sensitivity of ultrasound to detect fetal anomalies has been and remains the subject of research and audit18–20. To date, however, there have been more limited attempts to examine objectively what can and what cannot be seen and reported confidently in the context of fetal MRI, yet the use of MRI for fetal imaging in-utero and as a ‘less invasive’ autopsy is becoming more widespread11, 21. Levine et al.13, using in-utero fast fetal MRI, reported a four-point scale to assess the ‘conspicuity’ of anatomy, ascribing a score of 1 to indicate that the organ could not be seen, 2 to indicate that less than half of the organ or structure was seen, 3 to indicate that more than half of the structure was seen, and 4 to indicate that the entire organ could be seen. While being important in evaluating the ability of fetal MRI to image specific organs, conspicuity of anatomy as assessed by Levine et al.13 does not equate to an assessment of normality or abnormality.

Kubik-Huch et al.12 reported on in-utero fetal MRI studies in 30 pregnancies, and used a five-point scale to assess diagnostic confidence of fetal anatomy12. In their scale, 1 was normal, 2 was probably normal, 3 was inconclusive, 4 was probably abnormal and 5 was abnormal. The use of such a scale allows a semi-quantitative approach to assessing technical improvements in postmortem MRI, for example, with high-field magnet technology, or the use of different receiver coils, as Figure 3 illustrates.

Figure 3.

Coronal T2-weighted images of a 21-week fetus with severe cerebral ventriculomegaly: (a) using knee coil; (b) using a customized four-channel phased array coil, with improved resolution of thoracic and upper abdominal anatomy.

The scale that we used may be considered a development of that used by Kubik-Huch et al.12, assessing the confidence that radiologists had in the normality or abnormality of anatomical structures; this assessment most closely resembles their judgements in reporting an MRI study. In devising our system, we used a 100-point scale. We considered that this might more easily distinguish the different degrees of diagnostic confidence possible for different organs in comparison with the five-point scale used by Kubik-Huch et al.12. A confidence scale rather than a conspicuity scale allows the radiologist to use the other information available to them. For example, many organs that may not be seen clearly, such as the ureters, are likely to be normal even when they cannot be assessed directly on MRI. Although the possibility of ureteric stenosis or atresia remains, this is unlikely in the absence of hydronephrosis. The confidence scores reported in this study therefore give useful preliminary information about the ability of MRI to distinguish normal from abnormal. The study design was, however, somewhat limited in that the number of normal organs was considerably greater than that of abnormal ones (Table 2 and Figure 2).

The discriminatory power of MRI confidence scores appeared to be high in distinguishing between normal and abnormal cardiac atria and ventricles, but was relatively poor for the atrial and ventricular septa and outflow tracts (Figure 2). The difficulties we describe, and those presented by others, mean that postmortem MRI may not as yet be used reliably for confirmation of structural heart disease (except in the most obvious cases) in fetuses or newborns. We would therefore caution against the use of postmortem MRI in assessing the diagnostic accuracy of fetal echocardiography22, although other investigators have reported promising results23, 24. Again, clinicians and parents need to be aware of these issues for counseling. The Kappa statistic for cardiac anomalies was 0.33, which may be interpreted as indicating only ‘fair agreement’25.

Other thoracic organs, such as the trachea and esophagus, could not always be evaluated with confidence. In our series, there were no abnormalities of these organs (e.g. tracheoesophageal fistula) identified at autopsy; such abnormalities often go undetected on obstetric ultrasound26. Although one fetus had abnormal bowel rotation (and scored 50 for this on MRI), there were no other major gastrointestinal (GI) tract pathologies (e.g. duodenal atresia). We are therefore unable to assess the performance of MRI in the assessment of other GI tract anomalies, although previous studies have suggested this may be poor5, 7.

A small number of false-positive diagnoses were made: reduced fetal lung volume was suspected in one case. At present we are unable to comment on the significance (if any) of apparently abnormal lung signal on postmortem MRI when histology from the corresponding lung appears normal. Similarly, the liver appeared abnormally large in one case, but had normal weight at autopsy. Such false-positive diagnoses could perhaps be excluded by performing fetal lung and liver volumetry, although this would be expected to significantly increase reporting time27, 28. The Kappa statistic for lung anomalies was 0.56, which may be interpreted as indicating ‘moderate agreement’25.

In contrast to difficulties elsewhere, the CNS could be assessed with a high degree of confidence, in keeping with studies showing that postmortem MRI has very high sensitivity, specificity and positive and negative predictive values for structural anomalies within the CNS9. The performance of postmortem MRI in assessing the CNS in this study was comparable to that described by Kubik-Huch et al.12. As the confidence scores in Figure 2 show, the discriminatory power of MRI to distinguish between normal and abnormal hemispheres, ventricular system and corpus callosum is good. This is particularly important as macroscopic and microscopic information obtained by conventional autopsy (often performed several days after stillbirth or termination of pregnancy) may be limited by autolysis, and this may be particularly severe after intracardiac injection of potassium chloride into the fetal heart.

There were several CNS cases in this study in which there was significant disagreement between the MRI findings (high confidence of abnormality) and those at autopsy (reported as normal). Such false-positive findings usually occurred in the context of other major CNS anomalies, such as severe hydrocephalus or holoprosencephaly, in which the anatomy may become distorted. These may be difficult to resolve if we consider autopsy as the reference or ‘gold standard’ investigation. Such disagreement has been described before by Griffiths et al.; in their study, consensus was usually reached in favor of MRI, rather than the pathological findings9. However, in our study, the doctors reporting MRI and autopsies had no knowledge of the autopsy or MRI results, respectively. We specifically did not attempt to reach a consensus diagnosis; it remains to be resolved whether conventional autopsy in these circumstances should indeed be the gold standard, as autolysis and damage arising during removal of the fragile brain may limit the information obtained at neuropathology. Postmortem MRI could prove a useful adjunct (or even be superior) to neuropathology in such cases11, 29. The Kappa statistic for brain anomalies was 0.83, which may be interpreted as indicating ‘very good agreement’25, and was in keeping with previously published findings9.

It seems likely that, with increasing experience and familiarity with fetal MRI studies and as MRI sequences and techniques have been refined, the confidence of our radiologists in reporting these studies will increase. The number of false-positive diagnoses might be expected to fall with increasing familiarity as well. We have not examined the reproducibility or repeatability of the confidence scale that we report, which is a significant limitation of this study. The next step would be to determine intra- and interobserver variability with a panel of radiologists reporting independently. Because of the specialist nature of this type of examination, this would, however, require at least a national, and possibly an international collaboration. It is of course possible that radiologists with different expertise and experience of both in-utero and postmortem fetal MRI would score different organs with greater or lesser degrees of confidence. Our data underscore the difficulties that radiologists may have when providing postmortem MRI as a clinical service. Further work, such as the proposed UK Department of Health's ‘Less Invasive Autopsy’ studies, may clarify the role of postmortem MRI in perinatal medicine as an adjunct to or replacement for conventional autopsy.

Acknowledgements

Members of the Cambridge post-mortem MRI study group also include: Dr A. F. Dean (Department of Histopathology, Addenbrooke's Hospital), Dr N. Coleman (Hutchison-MRC Centre, Cambridge) and Dr L. Berman (Department of Radiology, University of Cambridge). We are indebted to the parents who agreed to take part in this study at a very difficult time. We also acknowledge the assistance of the pediatric pathology service staff of Addenbrooke's Hospital: N. Wood, G. Kenyon, S. Brown and M. Macer. We are also grateful for the support of Dr B.-H. Lim, Hinchingbrooke Hospital, Dr C. P. Spencer, St John's Hospital, Chelmsford and Dr M. Sule, Ipswich Hospital. We also wish to thank the midwives and medical staff of the Rosie Hospital, Cambridge for their support. This study was supported by a grant from the Trustees of the Addenbrooke's Charities and the Fund for Addenbrooke's. ACGB's salary was in part funded by Cambridge Fetal Care.

Appendix S1

Checklist used by radiologists during the study. Each anatomical structure was scored according to the probability of it being normal, on a scale of 0 (certainly abnormal) to 100 (certainly normal). A score of 50 indicated uncertainty with respect to the normality of the structure. When an organ or structure was scored as abnormal the radiologist was asked to state the diagnosis.

Appendix S2

Median and range of confidence scores for organs on magnetic resonance imaging that were found subsequently to be normal (a) and abnormal (b) at autopsy.

Ancillary