Automatic and manual segmentation of the piriform cortex: Method development and validation in patients with temporal lobe epilepsy and Alzheimer's disease

Abstract The piriform cortex (PC) is located at the junction of the temporal and frontal lobes. It is involved physiologically in olfaction as well as memory and plays an important role in epilepsy. Its study at scale is held back by the absence of automatic segmentation methods on MRI. We devised a manual segmentation protocol for PC volumes, integrated those manually derived images into the Hammers Atlas Database (n = 30) and used an extensively validated method (multi‐atlas propagation with enhanced registration, MAPER) for automatic PC segmentation. We applied automated PC volumetry to patients with unilateral temporal lobe epilepsy with hippocampal sclerosis (TLE; n = 174 including n = 58 controls) and to the Alzheimer's Disease Neuroimaging Initiative cohort (ADNI; n = 151, of whom with mild cognitive impairment (MCI), n = 71; Alzheimer's disease (AD), n = 33; controls, n = 47). In controls, mean PC volume was 485 mm3 on the right and 461 mm3 on the left. Automatic and manual segmentations overlapped with a Jaccard coefficient (intersection/union) of ~0.5 and a mean absolute volume difference of ~22 mm3 in healthy controls, ~0.40/ ~28 mm3 in patients with TLE, and ~ 0.34/~29 mm3 in patients with AD. In patients with TLE, PC atrophy lateralised to the side of hippocampal sclerosis (p < .001). In patients with MCI and AD, PC volumes were lower than those of controls bilaterally (p < .001). Overall, we have validated automatic PC volumetry in healthy controls and two types of pathology. The novel finding of early atrophy of PC at the stage of MCI possibly adds a novel biomarker. PC volumetry can now be applied at scale.

segmentations overlapped with a Jaccard coefficient (intersection/union) of $0.5 and a mean absolute volume difference of $22 mm 3 in healthy controls, $0.40/ $28 mm 3 in patients with TLE, and $ 0.34/$29 mm 3 in patients with AD. In patients with TLE, PC atrophy lateralised to the side of hippocampal sclerosis (p < .001). In patients with MCI and AD, PC volumes were lower than those of controls bilaterally (p < .001). Overall, we have validated automatic PC volumetry in healthy controls and two types of pathology. The novel finding of early atrophy of PC at the stage of MCI possibly adds a novel biomarker. PC volumetry can now be applied at scale.

K E Y W O R D S
Hammers Atlas Database, hippocampal sclerosis, MAPER, mild cognitive impairment, morphometry

| INTRODUCTION
The piriform cortex (PC) constitutes the largest part of the primary olfactory cortex (Löscher & Ebert, 1996). Situated between the temporal and frontal lobes, it lies rostromedial to the amygdala, covering the fundus of the entorhinal sulcus. It has frontal and temporal lobe parts (Vaughan & Jackson, 2014). Primary functions of PC are olfactory processing and memory coding (Howard et al., 2009;Meissner-Bernard et al., 2019).
The PC serves as a cortico-subcortical hub with extensive limbic and cortical connectivity (Pereira et al., 2005;Vaughan & Jackson, 2014). Its projections extend mainly to the periamygdaloid cortex and to the anterior and posterior cortical amygdalar nuclei (Vaughan & Jackson, 2014). For this reason, Gonçalves Pereira et al. (Pereira et al., 2005) subsumed the PC and adjacent regions under the term 'piriform cortex-cortical amygdala area'. PC forms a network with the perirhinal and entorhinal cortex, connecting it tightly to the limbic system, in particular the hippocampus. Malfunctions in this network develop quickly towards pathological synchronization and seizure spread (Vismer et al., 2015).
The area tempestas, situated in the anterior (frontal) PC, has been extensively studied in rodents and nonhuman primates. This region is defined based on chemical sensitivity, rather than anatomically: chemoconvulsants induced epileptic seizures at lower concentrations compared with other brain regions (Gale, 1988;Piredda et al., 1985).
Several reports illuminate the role of PC in patients with epilepsy.
Using manual segmentations in patients with temporal lobe epilepsy (TLE), Pereira et al. (2005) found PC volumes to be smaller ipsilaterally than contralaterally or than in healthy participants. PC atrophy strongly correlated with hippocampal atrophy/hippocampal sclerosis (HS). A recent study demonstrated progressive atrophy of both ipsilateral and contralateral PC with longer duration of TLE (Iqbal et al., 2022). Resecting a larger portion of PC yielded higher rates of seizure freedom after surgery, suggesting again that the region is involved in seizure generation (Borger et al., 2021;Galovic et al., 2019).
Furthermore, the PC plays crucial roles in other neurological diseases. For example, patients with early-stage Alzheimer's disease (AD) showed deficits in odour identification, along with impaired response profiles in functional MR of the PC (Li et al., 2010).
Investigators of the cited studies segmented the PC manually, which is time-consuming. Automatic segmentation would enable studies on larger patient cohorts. Automatic parcellation of the PC based on functional MRI was recently described only in healthy participants (Zhou et al., 2019). Leon-Rojas et al. (2021) added manual PC segmentation (delineated following the same protocol as in Iqbal et al. (2022)) to the Geodesic Information Flows (GIF) algorithm to create a template for automatic PC segmentation and described the intraoperative use in a single case report.
Here, we developed an automatic segmentation method for the PC on MRI and applied it to patients with atrophy of structures adjacent to the PC. Atrophy was either (1) unilateral (patients with TLE and HS) or (2) bilateral (patients with mild cognitive impairment (MCI) and patients with AD). We added the region to the Hammers Atlas Database (www.brain-development.org) (Faillenot et al., 2017;Gousias et al., 2008;Hammers et al., 2003;Wild et al., 2017) and used the resulting labels to carry out automatic PC segmentations using MAPER (Multi-atlas propagation with enhanced registration) (Heckemann et al., 2010) in MR images of healthy participants, of patients with TLE and HS, and of patients with MCI or AD. Using quantitative evaluations, we compared and characterized both segmentation methods.
To evaluate our segmentation protocol and the 30 PC segmentations, we used the atlas data set to automatically segment two target data sets obtained in clinical neurological studies.
The first target data set (Keller et al., 2015)  The second target data set consisted of MR images from the ADNI database (adni.loni.usc.edu; for details of ethical approval, see www.adni-info.org). We downloaded 199 image sets of 151 unique subject IDs of the 'ADNI cohort 1 Baseline 3 T'. The image sets consist of T1-weighted baseline MR images acquired at 3 Tesla from patients with AD or MCI, and from healthy controls (voxel sizes 1.73 mm 3 ). We used the images labelled as 'processed' via each centre's ADNI pipeline, including gradient unwarping (geometric distortion correction) for General Electric (GE) and Siemens scanners and bias correction for GE, Philips, and Siemens scanners (see https:// adni.loni.usc.edu/methods/mri-tool/mri-analysis) as recommended by ADNI and to minimise differences between scans. Images and metadata of matching participants (n = 151) were accessed in April 2018.  (Yaakub et al., 2020). We subdivided the MCI group into progressive (pMCI) and stable MCI (sMCI), based on whether or not their diagnosis record changed to AD at any follow-up visit up to 36 months. Patients with MCI who converted to AD within 36 months were labelled as progressive MCI (pMCI). Patients whose diagnosis was still recorded as MCI at the last available follow-up visit were labelled as stable MCI (sMCI).
The work has been carried out in accordance with the code of ethics of the World Medical Association (Declaration of Helsinki).

| Measures of segmentation quality
To measure agreement of tested anatomical labels with reference labels, we relied primarily on the Jaccard overlap coefficient (Jaccard, 1901) (JC). Compared with the Dice coefficient (Dice, 1945), JC numbers are smaller and discriminate better within the range of values encountered in this study (Dice = 2 * JC/[1+ JC]).
To determine volume discrepancy (of which JC is not informative), we used the formula where vol ref is the reference label volume and vol test is the test label volume. We complemented this directional measure with jΔVj (absolute bias), which allows averaging of multiple measurements. Volume discrepancy of left-sided versus right-sided labels was used to assess asymmetry.
The manual segmentation protocol minimised the risk of gross shape aberrations or disconnected label components. Automatically generated labels were reviewed visually to ascertain absence of such aberrations. This obviates the need for additional measures of agreement based on surface distances.

| Manual segmentations of the PC
Our PC outlining protocol builds on the work of Pereira et al. (2005) and Galovic et al. (2019). According to Pereira et al. (2005), the tem- The Hammers Atlas Database (www.brain-development.org) has been described previously (Faillenot et al., 2017;Gousias et al., 2008;Hammers et al., 2003;Wild et al., 2017). We applied the protocol to patients from the target data sets (20 each, chosen randomly), enabling comparisons of automatically and manually determined labels in pathology.

| Intrarater and interrater analysis of manual PC segmentation
We conducted a reliability analysis in test-retest fashion to validate the outlining protocol. Rater 1 (D.S.) segmented the right and left PC twice at an interval of 1 week on the images of five participants from the Hammers Atlas Database.
Additionally, we performed an objectivity analysis with two additional independent raters, M.S. (Rater 2) and L.G. (Rater 3). Both outlined the right and left PC of the same five participants who had been selected for the reliability analysis. Before the first round of segmentations, the second rater and the third rater were only provided with the written outlining protocol (supplement). One week later, the two additional raters independently underwent one practice session, where Rater 2 and Rater 3 practised segmenting the PC under guidance from Rater 1. Images from two participants were used for the practice session; these were not among the five participants used for the previous analysis. Six weeks (Rater 1) or 4 weeks (Rater 2) later, the raters segmented the same five participants a second time.
For both intrarater and interrater analyses, we determined JC, using the first round of segmentations by Rater 1 as the reference.

| Automatic segmentation of the PC
Automatic segmentation of the PC was performed using MAPER (Heckemann et al., 2010) (https://github.com/soundray/maper), an application for multi-atlas-based brain image segmentation. MAPER implements an ensemble machine learning approach to transfer anatomical knowledge represented in an atlas database to a target image.
Each atlas is paired with the target and registered using free-form deformation (Rueckert et al., 1999), initially considering tissue probability maps and subsequently T1-weighted images. The resulting geometric transformation is applied to the atlas labels, generating one segmentation label set per atlas in the target space. The individual label sets are then combined into a single output label set using voterule decision fusion (Kittler et al., 1998).
We conducted a leave-one-out cross-comparison analysis to measure agreement (JC, volume error) between MAPER-generated PC labels and manually generated ones.
Bland-Altman analysis was used to detect or exclude proportional bias on the volume measurements.
We then used MAPER with all 30 atlases to segment the PC in the target data sets. We determined left/right asymmetry for each patient group (TLE with left/right sided HS; AD) and the healthy control groups. For method validation in patients with TLE and those with AD (cf. results Section 3.2), volumes of PC, amygdala, and hippocampus were normalised by intracranial volume (ICV; region volumes divided by ICV, scaled by 10 4 for ease of reading). ICV was estimated using Pincram (Heckemann et al., 2015) (https://github.com/ soundray/pincram).
As a sanity check, volumes of hippocampus and amygdala corrected by ICV were measured in both clinical data sets.
On the random samples from the TLE and AD data sets (20 each; see above) automatic-to-manual label agreement was determined.
Automatic segmentation was performed in native space. Overlaps in the interrater analysis were lower than in the intrarater analysis for both additional raters, but improved following the Volume discrepancies were similar between the right and left side (Table 1). The relative volume differences in relation to the average of both volumes of a pair of delineations in intrarater and interrater analysis are illustrated in Bland-Altman plots. A degree of inverseproportional bias is evident between average PC volume and relative PC volume difference in the interrater objectivity analysis (r = À.58, p = .008; Figure 1).

Hammers Atlas Database
Manual and automatic segmentations of the PC in the Hammers Atlas Database group overlapped with a mean JC of 0.52 (right) and 0.47 (left) ( Table 2). For volume discrepancies, see Table 2

Patients with TLE and patients with AD
In the 20 patients with TLE and HS, on unflipped images (i.e., not considering the lateralization of the epilepsy focus), the agreement between automatic and manual segmentation was slightly lower than in the Hammers Atlas Database group, with a mean JC of 0.41 (right) and 0.39 (left) ( Table 2). For the group of 20 patients with AD, a mean JC of 0.35 (right) and 0.34 (left) was found. Volume discrepancy values are shown in Table 2 and volume differences are illustrated in Figure S2. For PC volumes distinguishing between ipsilateral and contralateral sides in a larger patient sample, see the following section.
One example of each patient group is illustrated in Figure 3.

| Scanner manufacturer
The PC volumes of the ADNI cohort (n = 151) differentiated by scanner type are shown in Table 3. The results of the equivalence test (TOST) of the volume measures between scanner types can be found in Table S1. The null hypothesis of nonequivalence can be rejected on the basis of the p-values obtained.

| Biological method validation and findings in patient groups
There was no significant association between age and ICV-corrected PC volume in healthy controls (NC) in the TLE NC group (r = À.062, p = .64, n = 58) or in the ADNI NC group (r = À.134, p = .37, n = 47).
While the association was in the expected direction, there was no significant association between PC volume and latency between patient enrollment and clinical progress into AD (r = .264, p = .12, n = 36). In a sub-analysis for patients with pMCI, the ICV-corrected volume was smaller than in healthy controls (F = 23.04, df = 2, p < .001; pMCI: 19% volume discrepancy, p < .001; AD: 19% volume discrepancy, p < .001). No difference between pMCI and AD was evident.

| DISCUSSION
We  Using the interrater measures of agreement (JC and volume discrepancy) as benchmarks, automatic delineation of PC was similarly accurate when applied to healthy participants in the Hammers Atlas Database. Bland-Altman plots showed volume differences within limits of agreement. Overlap measures such as JC decrease with surface-to-volume ratios (SVR) and increase with region volumes (Rohlfing et al., 2004). With respect to SVR and region volume, overlap between manual and automatic segmentations of PC was comparable to similarly anisotropic structures like the temporal horn of the lateral ventricle (Yaakub et al., 2020).
In patients with TLE with HS, the agreement between automatic and manual labels was as high as in the Hammers Atlas Database group. In patients with MCI and AD, it was lower than in the other two groups, but still satisfactory. In the presence of atrophy, the SVR will be higher, meaning that smaller and more dispersed JC values cannot be taken to indicate poorer segmentation accuracy.
Previous studies reported interrater analyses (Galovic et al., 2019, Data S1) with intraclass correlation coefficients or intrarater and interrater reliability assessment with Bland-Altman plots (Iqbal et al., 2022). In our study, we demonstrated that for our protocol there was also no significant volume difference in an intra-and interrater test using the Bland-Altman method. In addition, we conducted an interrater analysis with two additional independent raters. Going beyond volumetric comparison, we added overlap analyses using JC. This is important as even incongruent labels can have identical volumes.
For manual segmentation of PC, we mainly followed the work of Pereira et al. (2005), who used histological studies to validate defined anatomical landmarks for segmentation of the PC in MR images. The frontal part of the PC is less accurately defined by MR intensity gradients or landmarks than the temporal part. Pereira et al. (2005) excluded frontal PC from their definition, as they considered its borders in MR images insufficiently distinct. Other studies (Borger et al., 2021;Galovic et al., 2019) did include the frontal part into their definitions of the PC, building on the work of Mai et al. (2008). Considering that the area tempestas, a part of the frontal PC, is a particularly epileptogenic region, we decided to include the frontal PC in our anatomical definition of the region. We terminated PC segmentation in the most rostral slice anterior to the first appearance of hippocampus to avoid any overlap with hippocampal regions and to reach higher reproducibility. Pereira et al. (2005)  We note that the study by Iqbal et al. (2022) reported significantly larger volumes of the right PC than of the left PC in healthy controls. In contrast, this asymmetry was not evident in the current study and in the study of Pereira et al. (2005). This will be explained by the different termination of PC segmentation. Our study defines the last slice by first appearance of the hippocampus and Pereira et al. for the TLE and ADNI groups do, that is, the volume loss tends to be underestimated compared with manual segmentation. This is a variant of regression towards the mean likely to occur when a training sample exhibits fewer extreme characteristics than the testing sample, for example, when the training (atlas, normal reference) sample represents a broader population than the testing (target, patient) sample.
We note that volume bias did not prevent automatic detection of the abnormalities. The GIF algorithm performs well on small regions and brain regions which can only be delineated by grey-scale gradients (Cardoso et al., 2015) and would hence be predicted to work well for PC. However, it is not publicly available so we could not add the PC as a region to be segmented by GIF.
In patients with TLE with HS, PC atrophy was seen on the same side as HS. The scale of atrophy in comparison with matched healthy controls was comparable between PC and amygdala, while the extent of atrophy in the ipsilateral hippocampus was considerably larger. This is in line with results of Pereira et al. (2005), who reported similar observations. Divergent definitions of PC (see above) preclude quantitative comparison of PC atrophy between the studies.
The PC is a key node within the epileptogenic network of focal epilepsies, particularly of TLE with HS (Laufs et al., 2011;Young et al., 2018). The important role of the PC in TLE was confirmed in studies by Galovic et al. (2019) and Borger et al. (2021), which demonstrated that resection of the PC predicts seizure freedom after TLE surgery.
While our biological findings in TLE corroborate those of earlier studies, we also generated novel results in the form of PC volumetry in MCI and AD. Atrophy of the PC was found in comparison with healthy controls. AD is known to affect olfaction in the sense of higher-order deficits: patients progressively lose the ability to perceive odour quality, to discriminate between odours, and to identify them (Kesslak et al., 1988;Li et al., 2010). This indicates early involvement of higher-order olfactory pathways that include the PC (Li et al., 2010), in congruence with findings in patients with MCI and AD who on functional MR imaging showed less activation than controls in the anterior piriform cortex during odour identification (Kjelvik et al., 2021).
The amygdala and hippocampus, direct neighbours of PC, are known to shrink during AD progression (Heckemann et al., 2011;Klein-Koerkamp et al., 2014). PC volume loss was approximately twice as large as in the hippocampus, and also larger than in the amygdala.
PC atrophy was similarly distinct in patients with MCI as in those with AD, suggesting PC atrophies early on in AD. PC volume may thus have a role as a novel biomarker for early AD stages.  Table S1 show that the null hypothesis of scanner differences can be rejected for all comparisons. In conclusion, PC volumetry results were independent of the manufacturer of the scanning equipment.
A limitation is that the MR images in the Hammers Atlas Database had been acquired at 1.5 Tesla. While they are of good quality, certain anatomical landmarks are easier to identify on images acquired at higher field strengths, especially subtle correlates of the sulcus semiannularis, which marks the medial border of PC in rostral slices.
While empirically, intrarater and interrater agreement was strong, we found that identification of the limen insulae marking the most rostral slice on which PC was to be outlined could be difficult due to partial volume effects. Rater training mitigated these difficulties.
While the method presented here will allow automatic determination of extent and volume of the PC, if manual verifications are desired, a corresponding training period should be envisaged.

| CONCLUSION
The multi-atlas MAPER pipeline and manual delineations of PC integrated into the Hammers Atlas Database enable reliable automatic delineation of PC in healthy adult participants as well as in patients with TLE and HS or AD. As expected, PC volumes are unilaterally smaller on the side of HS and bilaterally smaller in MCI and AD. We found that atrophy of the PC in MCI and AD occurred early and exceeded that of hippocampus and amygdala, pointing to a potential role as a biomarker. The possibility to obtain PC volumes at scale will now allow such biomarker and functional studies.