Multiple techniques exist for the automated segmentation of magnetic resonance images (MRIs). The validity of these techniques can be assessed by evaluating test–retest reliability, interscanner reliability, and consistency with manual segmentation. We evaluate these measures for the FSL/FIRST subcortical segmentation tool. We retrospectively analyzed 190 MRI scans from 87 subjects with mood or anxiety disorders and healthy volunteers scanned multiple times on different platforms (N = 56) and/or the same platform (N = 45, groups overlap), and 146 scans from subjects who underwent both high-resolution and whole brain imaging in a single session, for comparison with manual segmentation of the hippocampus. The thalamus, caudate, putamen, hippocampus, and pallidum were reliably segmented in different sessions on the same scanner (Intraclass correlation coefficient (ICC) > 0.83 scanners and diagnostic groups pooled). In these regions, the range of between platform reliabilities were lower (0.527 < ICC < 0.953), although values below 0.7 were due to systematic differences between platforms or low reliability in the hippocampus between eight- and single-channel coil platforms. Accumbens and amygdala segmentations were generally unreliable within and between scanning platforms. ICC values for hippocampal volumes between automated and manual segmentations were acceptable (ICC > 0.7, groups pooled), and both methods detected significant differences between genders. In addition, FIRST segmentations were consistent with manual segmentations (in a subset of images; N = 20) in the left caudate and bilateral putamen. This retrospective analysis assesses realistic performance of the algorithm in conditions like those found in multisite trials or meta-analyses. In addition, the inclusion of psychiatric patients establishes reliability in subjects exhibiting volumetric abnormalities, validating patient studies. Hum Brain Mapp 34:2313–2329, 2013. © 2012 Wiley Periodicals, Inc.