Patch‐wise brain age longitudinal reliability

Abstract We recently introduced a patch‐wise technique to estimate brain age from anatomical T1‐weighted magnetic resonance imaging (T1w MRI) data. Here, we sought to assess its longitudinal reliability by leveraging a unique dataset of 99 longitudinal MRI scans from a single, cognitively healthy volunteer acquired over a period of 17 years (aged 29–46 years) at multiple sites. We built a robust patch‐wise brain age estimation framework on the basis of 100 cognitively healthy individuals from the MindBoggle dataset (aged 19–61 years) using the Desikan‐Killiany‐Tourville atlas, then applied the model to the volunteer dataset. The results show a high prediction accuracy on the independent test set (R2 = .94, mean absolute error of 0.63 years) and no statistically significant difference between manufacturers, suggesting that the patch‐wise technique has high reliability and can be used for longitudinal multi‐centric studies.


| INTRODUCTION
Brain age estimation has become a research topic of considerable interest in neuroimaging studies, progressively attracting attention from both clinical and engineering communities (Cole, Marioni, Harris, & Deary, 2019). In the last decade, substantial efforts have been devoted toward the development of highly accurate brain age estimation frameworks through modalities as different as anatomical MRI (K. Franke, Ziegler, Kloppel, Gaser, & Alzheimer's Disease Neuroimaging Initiative, 2010), fluorodeoxyglucose positron emission tomography imaging (Goyal et al., 2019), and brain electroencephalogram signals (Al Zoubi et al., 2018).
Brain age estimation techniques based on anatomical MRI can be classified into three approaches: (1) voxel-wise techniques, introduced by Franke and colleagues (K. Franke et al., 2010), that use voxel signal intensities obtained from gray matter (GM), white matter (WM) or a combination of both as dependent variables in the brain age estimation framework. An extensive review of the voxel-wise technique and its application in neuroimaging studies is presented in (Katja Franke & Gaser, 2019); (2) region-wise techniques, as proposed by Valizadeh, Hanggi, Merillat, and Jancke (2017), employ brain anatomical measures such as those obtained with a segmentation algorithm (e.g., FreeSurfer [http://freesurfer.net]) as dependent variables in the brain age estimation framework (Pardoe & Kuzniecky, 2018). These techniques have been used in investigations of brain age not only among healthy individuals, but also for different neurological diseases (Katja Franke & Gaser, 2019); and (3) patch-wise grading, introduced by Beheshti and peers (Beheshti, Gravel, Potvin, Dieumegarde, & Duchesne, 2019) as the most recent approach in the field. It uses image similarity metrics to match test patches to known labels from a library of training set, then weighing and averaging information from the training set (e.g., chronological age) to derive final values (e.g., brain age) on the unseen test image.
Although both voxel-and region-wise techniques have shown acceptable prediction accuracies (i.e., mean absolute error [MAE] ranging from four to 8 years), the patch-wise technique demonstrated an increased prediction accuracy on an independent test set (MAE < 2 years; Beheshti et al., 2019). While all three brain estimation paradigms have been widely used in cross-sectional studies, there have been few investigations of their reliability for longitudinal brain age assessment. In fact, only Cole and colleagues have recently reported longitudinal results of voxel-wise brain age estimation using deep learning (Cole et al., 2017), with MAE of 4.16, 5.17 and 4.34 years for GM, WM and GM + WM modalities, respectively, and a high test-retest reliability (ICC > 0.90).
We decided to explore this aspect further by leveraging a unique dataset of 99 longitudinal MRI scans from a single, cognitively healthy volunteer acquired over a period of 17 years (aged 29-46 years). We were able to demonstrate the reliability of the patch-wise technique (cf., Section 3.2), as well as investigate the influence of different scanner manufacturers on its accuracy (cf., Section 3.3).

| Training MRI dataset
We used the same training set from our previous study (Beheshti et al., 2019) to train the patch-wise brain age estimation framework, specifically the MindBoggle-101 dataset (https://mindboggle.info; Klein & Tourville, 2012), which is composed of T1w MRI scans of 100 cognitively healthy individuals between the ages of 19 and 61 years (M = 28.32, SD = 8.38, 44% female). This dataset was acquired on two different scanner manufacturers (Siemens, N = 64; Philips, N = 36). The age distribution of the training set is presented in Figure 1a. As described in our work, we extracted 62 cortical labels for each T1-weighted MRI scan based on the Desikan-Killiany-Tourville parcellation protocol (Klein & Tourville, 2012) with the FreeSurfer segmentation software (http://freesurfer.net; version 5.3; default setting, recon-all). Each brain segmentation was visually inspected thoroughly on all slices in the coronal plane. The stability of the parcellation was not measured, but given that the images came from a high number of different scanners, some notable variability was likely to occur. For example, morphometric variability was previously reported using the subset of images acquired after a scanner upgrade (Trio to Prisma) on three Siemens scanners .
Next, all MRI images were mapped in pseudo-Talairach MNI space on the basis of an affine linear registration (MINC 2.2.00 toolkit; mincresample function, default setting, and tri-linear interpolation method), then resampled to a voxel size of 1 × 1 × 1 mm 3 . To map labels in pseudo-Talairach MNI space, we used the mincresample function with nearest-neighbor interpolation. The skull and other nonbrain tissues were eliminated using an intracranial mask. Finally, in order to diminish intensity variations among various scanner models, the voxel intensity of each registered MRI image was linearly mapped to a [0-100] range as follows: Where MRI raw , min(MRI raw ), and max(MRI raw ) stand for the raw intensities, the minimum intensity, and the maximum intensity in each MRI image, respectively. The MRI image preprocessing was conducted using the MINC 2.2.00 toolkit.

| Testing MRI dataset
The pipeline of the proposed patch-wise brain age framework 2.3 | Patch-wise grading for brain age estimation The technical details of the patch-wise grading brain age estimation have been fully described previously (Beheshti et al., 2019). In summary, for each test label under study, a library of N (N = 20) closest subjects from the training set was composed on the basis of the sum of the squared difference criterion. It is worthwhile to mention that generating the library set was entirely independent of scanner manufacturers. Next, for each voxel x i of the considered study, a patch comparison was conducted between patch p(x i ) (i.e., a 7 × 7 × 7 voxel cube) with all patches p(x j ) from the library set. This comparison yields the following weighting function between the voxel under study x i and the voxel x j from the training library (Coupe et al., 2011): In the above equation, ww(x i , x s,j ) refers to the weighting function for the (x i , x s,j ) pair; k.k 2 is the L 2 -norm; and p(x s,j ) stands for the patch which was centered on j th voxel of the training sample s (i.e., x j ). We carried out a preselection technique focused on the structural similarity measure criteria (Wang, Bovik, Sheikh, & Simoncelli, 2004) to pick the most insightful patches, so that we can omit weak patches that do not meet the threshold on ss. Finally, μ p(x) and σ p(x) are the mean and standard deviation of voxel values in the patch p(x), respectively, while h is the smoothing parameter which can be computed as follow: where, as indicated in (Coupe, Eskildsen, Manjon, Fonov, & Collins, 2012), the constants λ and δ were set to 0.5 and 10 −7 , respectively. The only difference from our prior study is that the grading value g at the voxel x i has been modified as: where V i refers to the search volume which ranges from 9 × 9 × 9 to 15 × 15 × 15 to discover the ideal one (Coupe et al., 2012). Age Test and Age s are the test participant's and training library subject's chronological ages, respectively.
After computing the grading values for all voxels within a label, we calculated the final patch-wise grading value by averaging grading values of all voxels over the respective label. Figure 2 illustrates the pipeline of the proposed patch-wise brain age estimation framework, while the pseudo-code of the patch-wise grading stage is shown in Pseudo Algorithm 1.

| Validation and performance assessment
To predict brain age, we used a support vector machine regression predictor implemented in MATLAB (i.e.,"fitrsvm" function, kernel: linear,    Figure 3 shows the relationship between the estimated brain age as a function of chronological age, as well as the predicted difference (brain age delta) against the mean of chronological age and predicted brain age (i.e., Bland-Altman plot) on the training set model obtained via a leave-one-out strategy. Our prediction model F I G U R E 3 Training set model: (a) Scatter plot of estimated brain age as a function of chronological age. The solid black line shows the regression line, while the dashed black lines stand for 95% prediction band on the model prediction. (b) Bland-Altman plot between estimated brain age and chronological age. The Mean axis is the average of estimated brain age and chronological age; and the Δ axis refers to the difference between chronological age and estimated brain age. The solid black line represents the mean age difference between estimated brain age and chronological age, while the dashed black lines show ± 1.96 standard deviation. RPC and CV are reproducibility coefficient and coefficient of variation, respectively; MD is mean difference between estimated brain age and chronological age F I G U R E 4 Evaluation of the performance of patch-wise brain age on a single individual volunteer across time. (a) Scatter plot of estimated brain age as a function of chronological age. The solid black line shows the regression line, while the dashed black lines stand for 95% prediction band on the model prediction. (b) Bland-Altman plot between estimated brain age and chronological age. The Mean axis represents the average of estimated brain age and chronological age; the Δ axis refers to the difference between chronological from patch-wise brain age. The solid black line stands for the mean age difference between estimated brain age and chronological age, while the dashed black lines show ± 1.96 standard deviation. RPC and CV are reproducibility coefficient and coefficient of variation, respectively; MD is mean difference between estimated brain age and chronological age reached a high predictive accuracy in this training set (MAE = 1.30 years, RMSE = 1.66 years and R 2 = .96).

| Longitudinal performance
To assess the longitudinal reliability, we applied the brain age model from the training set to the test set. Figure 4 shows the estimated brain age plotted as a function of chronological age, as well as the predicted difference (brain age delta) against the mean of chronological age and predicted brain age (i.e., Bland-Altman plot) for a single  Figure 6 illustrates the resulting patch-wise grading values on our test dataset with respect to scanner manufacturer at this regional level. A summary of statistical information related to the grading values across the cortex with respect to MRI manufacturers is presented in Table 2, while   Table 3 lists the highest grading values achieved from the proposed patch-wise technique. As can be seen from

| DISCUSSION
The main objective of this study was to assess the reliability of the patch-wise brain age estimation technique in a longitudinal setting. In our previous study (Beheshti et al., 2019), we extended the notion of patch-wise grading from Coupé and colleagues (Coupé et al., 2012) to estimating brain age across the cortex from 3D anatomical MRI data.
Our proposed patch-wise grading technique was tested in a crosssectional design and showed significantly improved prediction accuracy in an independent test set (MAE < 2 years) when compared to state-of-the-art methods (Beheshti et al., 2019). When testing on the longitudinal SIMON dataset, we accurately estimated brain age with a MAE < 1 year over a long age span (17 years) covering early middle age (29-46 years old). These results support our claim that the patchwise technique is amenable to longitudinal brain age studies.
In a previous report (K. Franke et al., 2010), the authors investigated the influence of different scanner manufacturers on a voxelwise brain age estimation framework. They reported a slight difference in terms of prediction accuracy between individual scanner manufacturers. In our previous study, we assessed the influence of different MRI manufacturers on a patch-wise technique for cross sectional studies (see Supporting Information). However, in the present study, we also explored the impact of different MRI manufacturers on the patch-wise brain age estimation framework for longitudinal brain age estimation studies. Based on our results, we have not observed a statistically significant difference among various scanners in terms of brain age-delta ( Figure 5)