Reliability of two techniques for assessing cerebral iron deposits with structural magnetic resonance imaging


  • Maria C. Valdés Hernández PhD,

    Corresponding author
    1. SINAPSE Collaboration, SFC Brain Imaging Research Centre, Department of Clinical Neurosciences, University of Edinburgh, Edinburgh, UK
    2. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
    • SFC Brain Imaging Research Centre, Image Analysis Lab, Division of Clinical Neurosciences, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
    Search for more papers by this author
  • Tina H. Jeong XX,

    1. School of Medicine and Veterinary Medicine, University of Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Catherine Murray MA (Hons),

    1. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
    2. Department of Psychology, University of Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Mark E. Bastin DPhil,

    1. SINAPSE Collaboration, SFC Brain Imaging Research Centre, Department of Clinical Neurosciences, University of Edinburgh, Edinburgh, UK
    2. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
    3. Medical and Radiological Sciences (Medical Physics), University of Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Francesca M. Chappell PhD,

    1. SINAPSE Collaboration, SFC Brain Imaging Research Centre, Department of Clinical Neurosciences, University of Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Ian J. Deary MB, ChB, PhD (Edin),

    1. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
    2. Department of Psychology, University of Edinburgh, Edinburgh, UK
    Search for more papers by this author
  • Joanna M. Wardlaw MB ChB, MD

    1. SINAPSE Collaboration, SFC Brain Imaging Research Centre, Department of Clinical Neurosciences, University of Edinburgh, Edinburgh, UK
    2. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK
    Search for more papers by this author



To test the reliability of two computational methods for segmenting cerebral iron deposits (IDs) in the aging brain, given that its measurement in magnetic resonance imaging (MRI) is challenging due to the similar effect produced by other minerals, especially calcium, on T2*-weighted sequences.

Materials and Methods

T1-, T2*-weighted, and fluid-attenuated inversion recovery (FLAIR) MR brain images obtained at 1.5T from 70 subjects in their early 70s who displayed a wide range of brain IDs were analyzed. The first segmentation method used a multispectral approach based on the fusion of two or more structural sequences registered and mapped in the red/green color space followed by Minimum Variance Quantization. The second method employed a combined thresholding, size and shape analysis using T2*-weighted images augmented with visual information from T1-weighted data.


Both segmentation techniques had high intra- and interobserver agreement (95% confidence interval [CI] = ± 57 voxels in a range from 0 to 1800), which decreased in subjects with significant microbleeds and/or IDs. However, the thresholding method was more observer dependent in identifying microbleeds and IDs boundaries than the multispectral approach.


Both techniques proved to be in agreement and have good intra- and interobserver reliability. However, they have limitations, specifically with regard to automation and observer independence, so further work is required to develop fully user-independent methods of identifying cerebral IDs. J. Magn. Reson. Imaging 2011;33:54–61. © 2010 Wiley-Liss, Inc.

IRON IS TYPICALLY STORED in the body as a soluble oxyhydroxide in the protein ferritin, which controls the release of iron and prevents toxic damage to tissues. However, an insoluble form of iron oxyhydroxide, hemosiderin, can accumulate in tissues in some diseases, leading to organ damage. These iron deposits (IDs) occur as hemosiderin in the brain after intraparenchymal hemorrhage, on the surface of the brain following subarachnoid or subdural hemorrhage (superficial siderosis), and in microhemorrhages in brain tissue. In addition, iron may be deposited in the walls of small blood vessels, for example, in the perforating arterioles as they enter the brain substance in the inferior part of the putamen (1).

Cerebral IDs are of increasing interest due to their association with some diseases (2–4), mood disorders (5, 6), and cognition (7). In addition to iron, other metals and compounds including calcium and manganese may be deposited in the brain in pathological states. For example, in the case of putaminal IDs, pathology studies show that small arteriolar mineralization has staining properties of both iron and calcium (1, 4). These paramagnetic substances can affect the longitudinal (T1) and transverse (T2/T2*) relaxation times of mobile water protons predominantly through the outer sphere mechanism (8), with iron producing a reduction in T2/T2* relaxation time with little effect on T1, and calcium and manganese also reducing T2/T2* but in addition altering T1 depending on the pathological state (8). A shortening of T2/T2* relaxation time produces hypointensity on T2-weighted (T2W) and T2*-weighted (T2*W) magnetic resonance imaging (MRI), while altering T1 produces either hyperintensity or hypointensity on T1-weighted (T1W) MRI. Since calcium is hypointense on T2*W and T1W MRI and iron is hypointense on T2*W but not T1W, this enables differentiation of these two minerals using structural brain MRI.

The assessment of IDs by MRI offers novel and useful applications for diagnosis, longitudinal monitoring, and testing of new therapies for brain disorders. Several studies assessed brain microbleeds (BMBs) (9) and hemosiderin deposits visually (2, 3, 6, 10), and attempts have been made to segment these regions using computational image processing methods. For example, a Fourier-based method to differentiate iron and calcium was proposed by T. Freeman (“MRI analysis: a study of uncertainty”; http://medicalphysics, who used a Green's function to calculate magnetic susceptibilities from given field distributions. More recently, the use of phase imaging has been investigated in an attempt to assess areas with IDs (11). However, both these methods are emerging techniques whose accuracy and utility are still to be determined.

In this study we implemented and tested two techniques for segmenting brain IDs from a cohort of relatively healthy subjects in their early 70s using structural MRI data. These methods have previously been used to segment white matter hyperintensities (WMHs) in several studies of normal aging and have been found to be reliable (12–14). The first method, which is considered the gold standard technique, is conventional thresholding followed by manual editing. The second is a multispectral technique that has been shown to segment brain tissue and WMHs accurately in aging and pathological brain MRI data (15). Here we examine their intra- and interobserver repeatability and reliability in segmenting IDs for research and clinical use.


MRI Data

Structural T1W, T2*W, and fluid-attenuated inversion recovery (FLAIR) MRI data were obtained from 325 community-dwelling older subjects, born in 1936 and scanned at the age of 70 to 71 years, who were participating in a study of cognitive aging. MRI data were acquired on a GE 1.5T HDX clinical scanner and all subjects gave formal written consent. Table 1 displays the sequence parameters used to scan the participants; the slice location was contiguous in all cases. All sequences were registered to the T2W scan using FLIRT (FMRIB, Oxford, UK; http://www.fmrib. giving a final image matrix of 256 × 256.

Table 1. Parameters for Each Structural Sequence Used
  • The slice-gap was zero in all sequences.

  • *

    Zero-filled to 256 × 256.

TR/TE/TI (ms)9.8/4/50011320/104.9940/159002/147.38/ 2200
Slice thickness (mm)1.3224
Bandwidth (KHz)15.6320.8312.5015.63
FOV (mm)256 × 256256 × 256256 × 256256 × 256
Measurement time8min, 12s3min, 35s5min, 53s5min, 25s
Number of slices160808040
Matrix192 × 192*256 × 256256 × 256256 × 256

Visual Rating of Iron Deposits

IDs were visually assessed in 325 participants by an experienced image analyst with more than 15 years experience using the following three visual rating scales. The Brain Observer MicroBleed Scale (BOMBS) minimizes observer variation in the assessment of BMBs in clinical practice (16). Since there is no established, validated visual scale for categorizing the putaminal IDs, we developed a simple visual rating scale to categorize the putaminal deposits, the “Putaminal ID Visual Rating Scale.” This rates putaminal IDs from none to medium-high based on comparison with four standard cases (Fig. 1).

Figure 1.

Putaminal Iron Deposits Scale. T2*W axial slices that show, from left to right, a representative example of loads 1, 2, 3, and 4, respectively.

We also developed a General Visual Rating Scale to quantify all IDs, ie, microbleeds, superficial siderosis, old parenchymal hemorrhages, and putaminal IDs. This scale encompassed all deposits considered to represent iron, regardless of location, ie, on any slice on which the brain was visible, and included all sizes from small BMBs to large iron deposits in the basal ganglia and remnants of old hemorrhages, regardless of shape, eg, round BMBs to long, narrow areas of cortex siderosis. IDs were graded as: 0 (none), ie, absence of any visible iron deposition; 1 (mild), ie, equal or less than five BMBs or deposits whose extent was estimated to be less than 50 mm3; 2 (moderate), ie, from 6 to 30 BMBs or deposits whose extent was estimated to be between 50 mm3 and 200 mm3; and 3 (severe), ie, more than 30 BMBs and/or deposits whose extent was estimated to be more than 200 mm3. As a guide to estimate the volume of IDs, one BMB had, on average, a volume of 8 mm3. These visual rating scales provided a simple method for describing BMB and IDs in our population.

Scan Selection

To evaluate the two segmentation methods, we randomly selected a subset of 70 subjects from the original 325 participants who displayed the full range of IDs based on the above visual rating scales: from none to significant BMBs and other IDs, eg, deposits in the basal ganglia, midbrain, old hemorrhages, superficial siderosis, etc. We were careful to ensure that there were equal proportions of subjects in each iron load category and that all types of IDs were included (Table 2). The negative values of the kurtosis of the sample on both General and Putaminal Visual Rating Scales indicate that all grades of IDs on each scale were represented approximately equally throughout the sample.

Table 2. Statistical Characteristics of the 70 Subjects Grouped According to the Iron Load Rated by Two Visual Scales
ScaleStatistics of the sample distributionLoadNumber of subjectsPercent in the sample
  1. For both scales, the median value for the group was 1, the standard error for skewness was 0.287 and the standard error for kurtosis was 0.566.

Basal ganglia(0 to 4)70 subjects01420
Mean: 1.6912738.6
Skewness: 0.5222811.4
Kurtosis: −1.0523912.9
Whole iron load(0 to 3)70 subjects01420
Mean: 1.3912738.6
Kurtosis: −0.95731217.1

Imaging Features

Figure 2A shows an axial slice from T2*W and FLAIR scans in a representative subject. This shows that the main signal intensity level groups in each sequence are associated with cerebrospinal fluid (CSF), which produces the highest intensities in T2*W and the lowest in FLAIR, and WMHs which result in medium to high signal intensities in T2*W and high intensities in FLAIR. Normal-appearing brain tissues, ie, gray matter and white matter, are in the medium range of intensities in both sequences. As both segmentation techniques rely on differences in signal intensity levels to differentiate features of interest and therefore might be biased by brains with high WMH load or atrophy that would increase the amount of CSF signal, we first tested whether there was any association between background signal intensities from CSF or WMHs and the volume of iron-containing tissue in all 325 subjects. CSF and WMHs were quantified using the MCMxxxVI technique following an extensive validation process (15).

Figure 2.

A: T2*W (left) and FLAIR (right) axial slices of a subject with a mineralized basal ganglia. B: Resulting fused image in the red/green color space.

Differentiation of Iron and Other Minerals

T1W scans were used to differentiate areas with calcium deposits, ie, signal change compared with no signal change for IDs. As described below, T2*W sequences were used in isolation in the thresholding technique and combined with FLAIR in MCMxxxVI to segment regions with high iron content.

Two trained observers, blind to each other's findings and to the rating scores, applied the two segmentation methods to measure ID volumes in the 70 subjects. One observer repeated the analysis with MCMxxxVI, blind to other results and without referring to the original T2*W sequence, and tested the effect that simultaneous visual assessment of T2*W images had on the segmentation obtained using this method.

Iron Volume Measurement: Multispectral Segmentation Method (MCMxxxVI)

In MCMxxxVI, T2*W and FLAIR sequences were registered using affine linear registration in FLIRT (17). The intensity values of the scans were adjusted to optimize their contrast prior to fusing to obtain a volume in the RG color plane (Fig. 2B). This step guarantees that when the registered images are transformed into the hue, saturation and value (HSV) color space (18) with an angle of 120°, ie, red and green colors, the features to be segmented are far enough from the V axis, S = 0 for any value of H, that the model which describes this transformation will not become undefined. A brain mask was then obtained from the T2*W data using the Object Extraction Tool in Analyze 8.1 (, which applies thresholding, morphological erosion, dilation, and region growing steps to separate the brain from the skull. To segment and quantify the volume of IDs, Minimum Variance Quantization (MVQ) was applied using the implementation performed in the MatLab (MathWorks, Natick, MA) function “rgb2ind” which converted the fused RG T2*W and FLAIR scans into clustered sequences in the same RG color space. Previously (15), we found that 32 clusters was the optimum choice for achieving a good segmentation, and mapped the 32 clusters in a normalized graph of the RG space. We determined the clusters in the range of green that best discriminated the hemosiderin areas through interactive sampling. The segmentation was done automatically, followed by a manual removal of false-positives where required. (Note: MCMxxxVI is not designed for counting features, ie, the number of IDs, and was therefore used only for measuring ID volume. An additional processing step for object recognition using morphological operations would be needed for MCMxxxVI to be used to count objects.)

Iron Volume Measurement: Thresholding Method

After extracting the brain using the T2*W volume as described above, bias field correction was employed to minimize the effect of signal intensity drop-off near the edges of the T2*W images using the Guillemaud filter (19). Next, a slice was selected with significant BMBs or putaminal IDs, ideally with a variety of shapes and intensities, to allow the intensity threshold to be adjusted, between zero and less than half of the maximum intensity value, for optimal segmentation of areas with low intensity. An estimated maximum and minimum size of the hypointense “objects” was then adjusted interactively. The hypointense areas on T2*W images that satisfied the requirements of maximum size, circularity, and specified threshold range consistent with IDs were extracted using the “Object Counter” module in Analyze 8.1. Once the T2*W segmentation was complete the slice in the registered T1W volume corresponding with that used in the T2*W segmentation was visually examined to identify areas where calcium dominated, ie, signal change on T1W and hypointense on T2*W (8, 15). These calcium-dominant areas were removed manually as well as any false-positives, eg, blood vessels and choroid plexus.

Statistical Analysis

SPSS 14.0 (Chicago, IL) was used to perform the statistical analyses. All variables were assessed for normality using the Kolmogorov–Smirnov test. The inter- and intraobserver repeatability of both segmentation methods was assessed using Bland–Altman analysis (20) in the sample of 70 subjects.


The effect of CSF and WMH volume on iron load was assessed using linear regression on the full cohort. Neither WMHs nor CSF volume were significantly associated with iron load as determined using the visual rating scales (P > 0.18) (Table 3; Fig. 3). This indicates that the following results are not influenced by either atrophy or WMHs.

Figure 3.

CSF and WMH volumes per visually rated iron load. These measurements were done in all 325 subjects. The iron load was rated according to two visual scales, one for putaminal iron deposits and the general rating scale for putaminal iron, microbleeds, and old hemorrhages.

Table 3. Results of Testing for Any Relationship Between Parenchymal Hyperintense Lesions and CSF Volume and Iron Load
Independent variables in the regression testsVisual Rating scaleP-value
  1. Rated by the two visual rating scales in 325 subjects using a Bonferroni corrected ANOVA test.

Volume of the hyperintense lesions0 to 4 basal ganglia0.475
0 to 3 whole iron load0.325
CSF volume0 to 4 basal ganglia0.212
0 to 3 whole iron load0.181

Counting the Number of IDs

The mean interobserver difference obtained by the T2*W thresholding method was less than four IDs, but increased with increasing iron load above moderate according to the General Visual Rating Scale (Table 4; Fig. 4). For 10 or fewer IDs, the mean interobserver difference was ±1.5 IDs, and for more than 30 IDs counted by both analysts it was ±6 IDs.

Figure 4.

Bland–Altman plot of the results obtained by two analysts using the thresholding method to count the number of IDs in 70 subjects.

Table 4. Results of the Inter- and Intraobserver Reliability Tests, Given in Number of Voxels per Subject
Differences inNo. of observersMethodMeandiffer-ence (voxels)95% CI (±2 SD)Min. (absolute value) (voxels)Max. (absolute value) (voxels)
  • The visual assessment was done on T2*W images.

  • *

    Indicates that the method was applied after the images were visually assessed.

Number of IDs counted2Thresholding3.898±14.781023
Total volume of iron deposits2Thresholding−2.277±57.6490138
2MCMxxxVI *29.869±357.83250636
1MCMxxxVI *53.087±466.2401536
1MCMxxxVI with and without prior visual assessment38.746±391.5260933
Number of BMBs counted2Visual assessment (BOMBS)−0.543±3.05805
Number of slices with IDs2Visual assessment2.143±4.902012
1Visual assessment1.143±2.28509

Volume of IDs

The volumes of IDs measured by the T2*W thresholding method were smaller than the volumes obtained by MCMxxxVI (Fig. 5) for scans with moderate to high iron loads. For none or mild iron loads, rated according to the General Visual Rating Scale, the volumes provided by both methods were coincident. Both methods showed small systematic biases between observers, larger with MCMxxxVI (9 voxels) than with the T2*W thresholding method (2 voxels) (Fig. 6). The 95% confidence interval (CI) of the measurements obtained by both methods varied in an interval of ≈50% (T2*W thresholding) and 30% (MCMxxxVI) of the maximum volume of iron measured (Table 4).

Figure 5.

Bland–Altman plots. Interobserver variability for IDs segmentation using MCMxxxVI and thresholding for T2*W sequences in 70 subjects, expressed in number of voxels per subject.

Figure 6.

Bland–Altman plot. Intra- and interobserver variability for IDs segmentation using MCMxxxVI in 70 subjects, expressed in number of voxels per subject.

For the intraobserver reliability tests done for MCMxxxVI, there was no difference between the first and second measurements (Fig. 6) except for microbleeds in the brainstem and at the base of the cerebellum.

Visual Assessment

Microbleeds, Using BOMBS

For microbleeds visually identified using BOMBS, the interobserver difference in ID volume increased with increasing numbers of microbleeds, although the 95% CI of the assessments only varied in an interval of ±3 microbleeds and there was no bias, ie, mean difference of −0.5 (Table 4). The variability was maximal in the areas of the deep structures, namely, basal ganglia, internal/external capsules, and thalamus, and minimal in lobar regions, namely, subcortical white matter, cortex, or gray/white matter junction (Fig. 7).

Figure 7.

Bland–Altman plot of the interobserver assessment of microbleeds using BOMBS in 70 subjects, expressed in number of brain microbleeds.

IDs, Using the Putaminal Iron Rating Scale, the General IDs Visual Rating Scale, and Slice-per-Slice Visual Identification

Good results were obtained with MCMxxxVI without the visual assessment of the IDs on T2*W sequences. The average difference between the volume of IDs measured by MCMxxxVI with and without previous slice-per-slice visual identification was 38.74 voxels (Fig. 8; Table 4), which is similar to the interobserver difference.

Figure 8.

Bland–Altman plot. Reliability of MCMxxxVI with versus without concurrent visual assessment in the T2*W sequences in 70 subjects, expressed in number of voxels per subject.

The slices from each T2*W sequence where there was disagreement between observers in identifying IDs was counted. Table 5 shows the number of slices that differed between two observers (interobserver) and by the same observer (intraobserver). Differences of six or more slices in the interobserver analysis occurred only five times among the 70 cases, representing only 7.1% of the total. In the intraobserver analysis, differences of six or more slices occurred only three times, representing only 4.3% of the total. These small disagreements did not affect the visual rating according to the Putaminal ID Visual Rating Scale or to the General IDs Visual Rating Scale.

Table 5. Number of Slices in Which the Visual Assessment of IDs Differed
Slice differenceInterobserverIntraobserver
Frequency (number of slices)Percent (out of 70)Frequency (number of slices)Percent (out of 70)


Both quantitative techniques for segmenting IDs in structural brain MRI have high reliability and repeatability. However, there are factors that limit their applicability to research and general clinical use. MCMxxxVI uses a clusterization method, MVQ, to segment the IDs. Taking advantage of the distributions of colors in an image, it produces segmentations of noticeably better quality than other clusterization techniques (21), but small details could still be lost. To achieve high accuracy during segmentation, the color of the IDs should produce good contrast with the surrounding voxels. In other words, there is the potential for very small and hypointense IDs to be missed by MCMxxxVI.

Our results suggest that the T2*W thresholding method could be used for counting microbleeds, but is not advisable for hemorrhages or other types of IDs; MCMxxxVI does not currently enable lesion counting. This is because large circumferences and sizes increase the error with the result that more manual postprocessing editing is required to obtain optimum results. However, the thresholding method benefits from high sensitivity. Hence, accurate delineation of the hypointense areas in T2*W can be achieved, but, compared with MCMxxxVI, it also requires more manual editing and does not discern the areas where iron is associated with other minerals that also appear hypointense on T2*W, like calcium. Therefore, parallel display of other scan types is necessary to avoid misclassifying iron and calcium. Moreover, when segmenting IDs with unclear boundaries the segmented region depends entirely on the threshold set by the observer. Here, setting of the threshold can be very variable and prone to bias depending on the observer's definition of the lesion, which in turn is likely to be affected by his/her neuroanatomical and neuroradiological knowledge and previous experience. Such subjective nature of thresholding and segmentation appears to have been a major contributory factor to the interobserver variability seen when using the T2*W threshold-based method.

MCMxxxVI identifies areas where only iron is accumulated from those where other minerals are present by using combined information from two or more imaging modalities modulated in color space, in this case T2*W and FLAIR. It reduces the extent of subjective thresholding, but overestimates the boundaries of the IDs, mainly when the iron loads range from moderate to severe, making the results appear less favorable. Thus, although the results of our study show that both methods have high intra- and interobserver reliability, more testing is required in patients with different pathologies at different ages, eg, amyloid angiopathy, cavernous angioma, and traumatic brain injury, using imaging data obtained from different scanners employing different scanning protocols prior to their using in the clinical setting.

In summary, both methods we tested for segmenting and measuring the volume of brain IDs are in agreement and have high reliability, but require a manual postprocessing step to remove false-positives that makes them observer-dependent. Previous visual assessment of the IDs does not increase the reliability of these methods. The T2*W thresholding method is recommended for counting automatically the number of microbleeds, while MCMxxxVI is recommended for identifying putaminal IDs and distinguishing them from the areas where other minerals are present. More work is required to develop a reliable automatic method for use in research and clinical practice that combines the advantages of both approaches and includes automated counting of IDs such as microbleeds.