Enhanced detection of cortical atrophy in Alzheimer's disease using structural MRI with anatomically constrained longitudinal registration

Abstract Cortical atrophy is a defining feature of Alzheimer's disease (AD), often detectable before symptoms arise. In surface‐based analyses, studies have commonly focused on cortical thinning while overlooking the impact of loss in surface area. To capture the impact of both cortical thinning and surface area loss, we used anatomically constrained Multimodal Surface Matching (aMSM), a recently developed tool for mapping change in surface area. We examined cortical atrophy over 2 years in cognitively normal subjects and subjects with diagnoses of stable mild cognitive impairment, mild cognitive impairment that converted to AD, and AD. Magnetic resonance imaging scans were segmented and registered to a common atlas using previously described techniques (FreeSurfer and ciftify), then longitudinally registered with aMSM. Changes in cortical thickness, surface area, and volume were mapped within each diagnostic group, and groups were compared statistically. Changes in thickness and surface area detected atrophy at similar levels of significance, though regions of atrophy somewhat differed. Furthermore, we found that surface area maps offered greater consistency across scanners (3.0 vs. 1.5 T). Comparisons to the FreeSurfer longitudinal pipeline and parcellation‐based (region‐of‐interest) analysis suggest that aMSM may allow more robust detection of atrophy, particularly in earlier disease stages and using smaller sample sizes.

such as speaking, swallowing, and walking (Alzheimer's Association, 2020). The onset of AD is insidious, with years between initial biochemical changes and the onset of clinically apparent symptoms (Brookmeyer, Abdalla, Kawas, & Corrada, 2018). This pattern of slow progression offers a window for early detection and has raised the need for noninvasive methods of tracking AD-related changes.
Magnetic resonance imaging (MRI), which can be used to monitor the cortical changes that accompany AD, has become a useful tool in addressing this need (Chandra, Dervenoulas, Politis, & ADNI, 2019;Femminella et al., 2018). Three-dimensional reconstructions derived from MRI allow for analysis of cortical anatomy, and surface-based methods of analyzing these reconstructions have proven beneficial (Clarkson et al., 2011;Glasser et al., 2013;Robinson et al., 2018).
Within the surface-based methodologies, previous studies have often examined changes in cortical thickness as a marker of atrophy in AD (Belathur Suresh, Fischl, Salat, & ADNI, 2018;Clarkson et al., 2011;Cuingnet et al., 2011;Mattsson et al., 2018;Ossenkoppele et al., 2019). However, in this study we propose that vertex-wise mapping of cortical surface area changes, in addition to cortical thickness changes, could offer an improved method of tracking cortical atrophy over time.
A recently developed tool, anatomically constrained Multimodal Surface Matching (aMSM), has been shown to enable more accurate tracking of physical cortical changes over time, in a single subject . aMSM constrains local deformations of the anatomical surface to realistic behavior, based on the mechanical properties of brain tissue, producing detailed, vertex-wise maps of change in surface area. This methodology has previously been used to track cortical growth , but it has not yet been leveraged to measure patterns of cortical atrophy. As loss of cortical volume is dependent on both changes in thickness and changes in surface area, detailed maps of surface area loss, in combination with maps of cortical thinning, offer the ability to better characterize degenerative changes occurring across the cortex in AD.
In this study, we used MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to test the utility of aMSM in accurately detecting patterns of cortical atrophy. Using aMSM, we mapped changes in both thickness and surface area over 2 years in individual subjects with diagnoses of cognitively normal (CN), stable mild cognitive impairment (MCI-S), mild cognitive impairment which converted to AD (MCI-C), and AD. First, each diagnostic group was evaluated to identify patterns of atrophy in terms of thickness, surface area, and volume loss. Then, comparisons between groups were used to identify diagnostically relevant differences in atrophy. To compare our approach to existing methodologies, we repeated our analysis using the FreeSurfer longitudinal pipeline (Reuter, Schmansky, Rosas, & Fischl, 2012) and region-of-interest (ROI) analysis. Next, to examine the precision of aMSM-generated measures of atrophy across imaging methods, we compared atrophy maps from a subset of subjects who were scanned using two different acquisition protocols (3.0 vs. 1.5 T scanners). Finally, to evaluate the ability of aMSM to identify trends in atrophy across a range of sample sizes, we performed four parallel analyses using diagnostic groups made up of 12, 24, 48, and 90 subjects per group.

| Data
Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu

| Subject selection
All subjects who participated in ADNI1 and underwent both baseline and 2-year 3.0 T MRI scans were considered for inclusion in the primary analysis of this study. Age of subjects at the beginning of the study ranged from 55 to 90. All participants were required to have a study partner able to independently evaluate their functioning, be fluent in either English or Spanish, and be willing and able to participate in testing procedures and longitudinal follow-up.
In ADNI1, cognitively normal subjects were defined as having To ensure that all subjects were scanned on the same scanner at baseline and 2-year follow-up, only subjects from standardized analysis sets were considered (Wyman et al., 2013). From this cohort, we identified subjects with baseline and 2-year 3.0 T scans. Subjects were removed from analysis if their baseline or 2-year scan failed at any point in MRI processing, including segmentation or registration. To test for significant differences in age, education, gender, race, ethnicity, study site, and MRI scanner manufacturer between diagnostic groups, one-way ANOVA or the exact multinomial test of goodnessof-fit were used as appropriate.
While our primary analysis focuses on 3.0 T MRI scans, ADNI1 1.5 T scans were utilized to further validate our methodology. In ADNI1, only a quarter of subjects were scanned at 3.0 T, while all subjects were scanned on 1.5 T MRI scanners. This provided a large pool of additional subjects to supplement our primary, small cohort.
ADNI1 subjects included in the standardized analysis sets (Wyman et al., 2013) who had baseline and 2-year follow-up 1.5 T scans were considered for this analysis. Subjects were excluded if MRI processing failed at any point, including segmentation or registration.

| MRI acquisition and segmentation
MRI scans were collected according to the ADNI1 MRI protocol, which included back-to-back 3D magnetization prepared rapid gradient echo (MP-RAGE) scans and B 1 calibration scans when applicable (Jack et al., 2008). To increase standardization of images taken at different sites, postacquisition system-specific corrections were performed to address image artifacts (Jack et al., 2008). Corrected artifacts include image geometry, intensity nonuniformity, and image intensity (Jack et al., 2008).
We utilized quality assurance images produced by the ciftify cifti_vis_recon_all utility (Dickie et al., 2019) to conduct an initial review of segmentation quality. In a minority of cases, segmentation quality could not be determined from these images alone. In these instances, midthickness surfaces were opened in Connectome Workbench (Marcus et al., 2011) and manually reviewed. Scans were excluded from analysis if automated segmentation did not correctly isolate the cerebral cortex.

| Surface-based registration
Ciftify (Dickie et al., 2019) was used to translate FreeSurfer output into a format consistent with the Human Connectome Project (HCP) pipeline (Glasser et al., 2013) and to accomplish registration to the MNI space. Surfaces from the native space, as defined in the HCP pipeline, were used as the starting point for longitudinal analysis with aMSM.
aMSM was used to align each subject's baseline and 2-year cortical reconstructions (Robinson et al., 2014;Robinson et al., 2018). As shown in Figure 1, aMSM modifies an input surface, bringing it into alignment with a reference surface and producing an output surface that has point-correspondence to the reference surface. Each surface can be represented as a series of vertices that connect to form triangular faces. aMSM evaluates these triangular faces to calculate cortical deformations that exist between the input and reference surfaces, while penalizing deformations incongruent with the mechanical properties of brain tissue . Uniquely, aMSM penalizes these deformations on the anatomical surface (Figure 1), offering increased accuracy over alternatives relying on deformations of the spherical surface, which introduce additional artifact . After registration, the deformations of each triangle provide detailed, local measures to map change in cortical surface area at the resolution of the control point grid (10,242 vertices in this study, corresponding to an average resolution of approximately 20 mm 2 ).
Each subject's left and right hemispheres were processed separately, and each hemisphere was processed bidirectionally. Bidirectional maps of cortical change, one produced by registration of baseline surface to 2-year surface and the other by registration of 2-year surface to baseline surface, were averaged to minimize potential bias associated with unidirectional registration .
Each aMSM job ran on a single processor, and multiple jobs were run in parallel as computing resources allowed. The median runtime for aMSM was 26.08 hr, with a runtime of <36 hr in 92% of cases.
F I G U R E 1 Longitudinal registration with aMSM. Physical deformation of the cortical surface between two timepoints is calculated for each triplet of vertices on the input surface (red dots) and reference surface (blue dots). Similar to other popular surface registration techniques (Yeo et al., 2010) vertices from both the input surface and reference surface are projected to a spherical representation to simplify mathematical calculations. However, unlike other registration techniques, aMSM (gray box) regularizes vertex displacement to minimize physical deformation (strain energy based on mechanical properties of brain tissue) between the anatomical surfaces (2a, 2b). Based on these constraints, triplets on sphere 1 are shifted to bring them into alignment with triplets on sphere 2 (3). Once realigned, sphere 1 is projected to a new anatomical output surface (4), resulting in more realistic physical deformations of the cortical surface

| Global and local measures of atrophy
The primary measures considered in this study were change in cortical thickness, change in surface area, and change in volume. For each subject, change in cortical thickness was calculated as log 2 (CT 2 /CT 1 ), where CT 1 refers to cortical thickness at baseline and CT 2 refers to cortical thickness at 2-year follow-up. Both CT 1 and CT 2 are While local thickness can be defined for each vertex, local surface area is calculated from the area forming a triangle between each triplet of vertices. To obtain a vertex-wise representation of local surface area, each vertex was assigned one third of the area for each triangle of which it is a part (Marcus et al., 2011). Change in cortical surface area was calculated as log 2 (SA 2 /SA 1 ), where SA 1 refers to surface area attributed to a given vertex at baseline and SA 2 refers to surface area attributed to a given vertex at 2-year follow-up. Change in surface area was calculated by aMSM, with the final metric representing the averaged results from each unidirectional registration. Change in cortical volume was calculated as log 2 ((SA 2 Â CT 2 )/(SA 1 Â CT 1 )).
To identify changes occurring at the global level, percent change in thickness, surface area, and volume were calculated at baseline and

| Statistics
To evaluate atrophy at the global level, one-way ANOVA was used to compare measures of atrophy between demographically balanced diagnostic groups. Tukey's Honest Significant Difference test was used to conduct pairwise comparisons. Effect size was also calculated, as Cohen's d, for each pair of diagnostic groups that differed significantly in terms of change in thickness, surface area, or volume.

| FreeSurfer longitudinal pipeline
The FreeSurfer longitudinal pipeline is a commonly used surfacebased method of measuring cortical atrophy. Like aMSM, the FreeSurfer longitudinal pipeline accomplishes longitudinal registration, allowing for the measurement of structural cortical change within a single subject over time. To directly compare these two registration methods, we reanalyzed the CN, MCI-S, MCI-C, and AD diagnostic groups using the FreeSurfer longitudinal pipeline instead of aMSM.
The FreeSurfer surfaces described in the "MRI acquisition and segmentation" section were used as a starting point for processing with the FreeSurfer longitudinal pipeline. This pipeline produced aligned baseline and 2-year follow-up surfaces for each subject. These surfaces were processed with ciftify and changes in thickness, surface area, and volume were calculated for each subject as described in the "Global and local measures of atrophy" section. Once atrophy maps had been generated for each subject, statistical analyses were performed using PALM as described in the "Statistics" section.

| Region of interest analysis
ROI analysis is a commonly used alternative to the continuous surface maps that are the focus of this study. To compare this methodology to results produced using aMSM, we reanalyzed atrophy in the CN, MCI-S, MCI-C, and AD diagnostic groups in terms of ROIs. The same FreeSurfer segmented and ciftify-processed surfaces that were used in our primary analysis were used for ROI analysis.
ROIs were based on the Desikan-Killiany (DK) protocol (Desikan et al., 2006). Within each ROI, average thickness and total surface area were calculated for each subject at both baseline and 2-year follow-up. Change in cortical thickness, surface area, and volume were calculated for each subject, at all ROIs, using the equations described in the "Global and local measures of atrophy" section.
One-way multivariate ANOVA (MANOVA) was used to identify differences in atrophy between diagnostic groups. Separate MAN-OVA were run for change in cortical thickness, change in cortical surface area, and change in cortical volume. Left and right hemispheres were processed separately. Differences between specific groups were assessed using pairwise comparisons with Bonferroni correction.

| Comparisons using 1.5 T Scans
To demonstrate the influence of MRI field strength on the measurement of atrophy, we identified subjects from our primary 3.0 T dataset who had baseline and 2-year follow-up scans performed on both 1.5 and 3.0 T scanners. Matching 1.5 and 3.0 T datasets were analyzed in parallel to identify the impact of field strength on changes in thickness, surface area, and volume. One-sample t-tests were performed using PALM, as described in the "Statistics" section, to define patterns of atrophy within each diagnostic group at each field strength.
We also used the larger cohort of ADNI1 subjects, scanned at 1.5 T, to examine the relative performance of surface area and thickness measures at a variety of sample sizes. Demographically balanced 90-subject samples were selected for each diagnostic group. To test for significant differences between diagnostic groups in age, education, gender, race, ethnicity, study site, and MRI scanner manufacturer, one-way ANOVA, the chi square test, or the exact multinomial test of goodness-of-fit were used as appropriate. To assess performance of 1.5 T scans at lower sample sizes, smaller samples of 12, 24, and 48 subjects per diagnostic group were randomly selected (Urbaniak & Plous, 2013) from within the 90-subject samples.
ANOVA, performed using PALM as described in the "Statistics" section, was used to identify differences in atrophy between diagnostic groups at each sample size.

| Subjects
In total, 89 subjects met eligibility criteria for inclusion in our primary, 3.0 T analysis. Of those subjects, 18 were removed following postsegmentation quality review and another five failed to successfully complete longitudinal registration with aMSM. Of the 66 subjects remaining after MRI processing, an additional 16 were removed from analysis to demographically balance the four diagnostic groups. This left us with a relatively small cohort of 50 subjects, comparable to the small sample sizes in pilot studies or observational studies focused on specific AD populations or subtypes (Machado et al., 2020;Persson et al., 2017). Demographic characteristics of the CN, MCI-S, MCI-C, and AD groups are presented in Table 1. The manufacturers of the MRI machines used to scan this cohort are presented in Table 2, and the study sites visited by subjects in the four diagnostic groups are presented in Table S1.
A total of 499 ADNI1 subjects met eligibility criteria for inclusion in our secondary, 1.5 T analysis. Of these, 16 subjects failed segmentation and four failed registration. Of the remaining 479 subjects, 160 were in the CN group, 127 in the MCI-S group, 92 in the MCI-C group, and 100 in the AD group. Forty-four subjects, 11 in the CN group, nine in the MCI-S group, 12 in the MCI-C group, and 12 in the AD group, also had successfully processed 3.0 T scans. These 44 subjects were used, along with their 3.0 T counterparts, in our analysis of the influence of field strength on the measurement of atrophy.
For our analysis of the influence of sample size on the measurement of atrophy, the 479 successfully processed subjects with 1.5 T scans served as the pool for selecting demographically balanced 90-subject samples within each diagnostic group. Demographic characteristics of these samples are presented in Table 3. The manufacturers of the MRI machines used to scan these subjects are presented in Table 4, and study sites visited are presented in Table S2. The 12, 24, and 48 subject samples used in our sample size analysis were randomly selected from within the samples presented in Table 3.
Demographic and MRI scanner manufacturer information for the 12, 24, and 48 subject samples is presented in Tables S3-S8.

| Global atrophy
Comparisons of global loss in thickness, surface area, and volume were conducted between the four diagnostic groups: CN, MCI-S, MCI-C, and AD. As shown in Figure

| Individual atrophy maps
In order to identify patterns of atrophy within each diagnostic group, we first calculated individual atrophy maps for each subject. Figure 3 demonstrates maps of change in unsmoothed thickness, smoothed thickness, surface area, and volume for one subject in the AD group.

| Patterns in atrophy by diagnostic group
Within each diagnostic group, patterns associated with change in cortical thickness, surface area, and volume were identified using onesample t-tests (Figure 4). Across all metrics, relatively little atrophy was identified in the CN and MCI-S groups, but considerable atrophy was present in the more symptomatic MCI-C and AD groups.
These latter groups showed particularly strong patterns of atrophy in the temporal and parietal lobes, though atrophy was detected throughout much of the cortex. For all groups except CN, the greatest degree of atrophy was detected by volume loss, which combines thickness and surface area loss.
Surface area and thickness loss identified similar regions of atrophy. However, the specific patterns and degrees of atrophy detected by each measure differed. Significant decreases in surface area tended to be detected over broader regions of cortex, while decreases in thickness were more localized. There were also some differences in the specific regions of atrophy identified by these two measures. For F I G U R E 2 Global loss in thickness, surface area, and volume by diagnostic group. Differences between diagnostic groups (CN, MCI-S, MCI-C, AD) were identified by ANOVA. Significant differences with .01 ≤ p < .05 are identified with the symbol *. Very significant differences with p < .01 are identified with the symbol **. Diamond symbols indicate group means. Lower and upper box edges represent the first quartile (Q1) and third quartile (Q3), respectively, with the area within the box representing the interquartile range (IQR). Horizontal lines inside of boxes represent group medians. Whiskers extend to the minimum and maximum nonoutlier values, where outliers are defined as falling below a minimum value of (Q1 -1.5*IQR) or above a maximum value of (Q3 + 1.5*IQR). Outliers are indicated with circles

| Areas of increased atrophy in the AD and MCI-C diagnostic groups
To compare differences in atrophy among the four diagnostic groups, ANOVA was used to compare groups within each atrophy metric (loss in thickness, surface area, or volume). Divergent patterns of atrophy between the AD and CN groups, the MCI-C and CN groups, and the MCI-C and MCI-S groups are illustrated in Figure 5. The AD and MCI-S groups also showed different patterns of atrophy, but only in the right hemisphere. These differences are similar to those seen in the right hemisphere when comparing the AD and CN groups

| Influence of field strength on measurements of atrophy
For a subset of the ADNI1 cohort, subjects were scanned using both 1.5 and 3.0 T acquisition parameters at baseline and 2-year time points.
To explore aMSM's ability to consistently and reliably measure atrophy under different image acquisition parameters, we analyzed and F I G U R E 5 Areas of increased 2-year atrophy in the AD and MCI-C diagnostic groups relative to the CN and MCI-S groups. Diagnostic groups were compared to identify differences in patterns of atrophy, and significant differences were found between AD and CN groups ( Overall, change in surface area was less impacted than change in thickness by differences in field strength. Loss in surface area was detected in the same regions of the cortex regardless of field strength, though there was generally more evidence of atrophy in the 3.0 T analysis. By contrast, thickness-based atrophy performed less consistently. While some common trends were detected in both the 1.5 and 3.0 T analysis, a considerable amount of variation also existed. Maps of change in volume ( Figure S5) identified more atrophy than change in surface area or change in volume alone but, due to variation in the thickness measure, did not measure atrophy as consistently as did change in surface area.

| Influence of sample size on measurements of atrophy
Finally, to demonstrate the reliability of aMSM results obtained using different sample sizes, we considered subsets of the large 1.5 T dataset. The largest sample considered is described in Table 3 When comparing the MCI-C and CN groups (Figures 9 and 10, middle row) or the MCI-C and MCI-S groups (Figures 9 and 10, bottom row), surface area and thickness measures performed similarly. In both of these comparisons, minimal significant differences were F I G U R E 6 Areas of increased 2-year atrophy in the AD group relative to the CN group, calculated using aMSM, the FreeSurfer longitudinal pipeline, and predefined ROIs. Areas in which thickness (left column), surface area (middle column), or volume (right column) loss differ between the AD and CN groups are shown. p values for the continuous surface maps produced using aMSM (top row) and the FreeSurfer longitudinal pipeline (middle row) are threshold-free cluster enhanced with family-wise error correction. p values associated with specific ROIs (bottom row) were adjusted with Bonferroni correction identified in groups of fewer than 48 subjects, and 48-and 90-subject group comparisons identified similar trends, including some substantial areas of atrophy. It is likely that few significant differences were identified between the 24-subject groups because the overall effect sizes were smaller than those seen in the AD and CN comparison ( Figures S7 and S8).
Notably, our primary analysis of subjects scanned at 3.0 T was able to detect significant differences in atrophy between groups when considering 12-13 subjects per group ( Figure 5). However, we were unable to detect such differences between the 12-subject 1.5 T groups (Figures 9 and 10). Trends that were detected in the small 3.0 T groups did tend to align with areas of increased atrophy in the larger 1.5 T groups, with even broader regions reaching statistical significance in the largest 1.5 T groups.

| DISCUSSION
This study compared three measures of cortical atrophy: thickness loss, surface area loss, and volume loss. By utilizing aMSM for longitudinal registration, we produced maps of change in surface area, optimized with anatomically constrained deformations, which offer advantages over maps produced by commonly used spherically based methodologies  and provide detail that cannot be achieved by global or ROI-based analyses.
To illustrate the benefit of this method relative to other existing approaches, we compared maps produced with aMSM to two common alternatives, the FreeSurfer longitudinal pipeline and ROI analysis. Furthermore, to characterize the reliability of atrophy metrics derived from thickness, surface area, or volume, we examined the impact of MRI field strength and sample size on aMSM-derived atrophy maps.

| Surface area loss as a complementary measure of cortical atrophy
Our results suggest that mapping change in surface area is a viable method of detecting atrophy, with group analysis producing results of comparable significance to those detected using change in thickness F I G U R E 7 Areas of increased 2-year atrophy in the MCI-C group relative to the CN group, calculated using aMSM, the FreeSurfer longitudinal pipeline, and predefined ROIs. Areas in which thickness (left column), surface area (middle column), or volume (right column) loss differ between the MCI-C and CN groups are shown. p values for the continuous surface maps produced using aMSM (top row) and the FreeSurfer longitudinal pipeline (middle row) are threshold-free cluster enhanced with family-wise error correction. p values associated with specific ROIs (bottom row) were adjusted with Bonferroni correction (Figures 4 and 5). These similar levels of significance were detected despite the fact that surface area maps did not undergo smoothing procedures that were applied to thickness maps.
Overall It has been found that cortical surface area and thickness are both heritable traits with separate genetic influences (Grasby et al., 2020;Panizzon et al., 2009), and the two measures are not equally impacted as volume is lost in AD. In medial temporal lobe structures specifically, surface area loss is thought to be characteristic of normal aging, while AD is associated with additional significant thickness loss (Dickerson et al., 2009). Further, thinning of the entorhinal and perirhinal cortices was found to correlate with memory performance, while decreases in surface area did not (Dickerson et al., 2009) particularly when evaluating small cohorts (Figures 6 and 7). Notably, at the small sample sizes used in this study, the FreeSurfer longitudinal pipeline was entirely unable to detect significant differences in surface area loss between diagnostic groups. ROI analysis was able to detect significant differences in certain predefined regions, such as the middle temporal gyrus, but this methodology sacrifices detail by forcing results to conform to predefined ROIs. Improvements in the vertex-wise measurement of change in surface area, such as those possible with aMSM, provide a critical tool in disentangling the influences of thickness and surface area loss on cortical atrophy in AD and other neurodegenerative diseases.

| Surface area loss as a more precise measure of cortical atrophy
While thickness and volume loss remain important measures of atrophy, our results suggest that change in surface area, when precisely and accurately mapped using a tool such as aMSM, possesses considerable utility as an independent measure of atrophy. To illustrate the performance of each metric, we first considered a relatively small subset of subjects from the ADNI1 dataset. When we compared maps of atrophy derived from the same subjects over the same period of time, but scanned at different field strengths, we found that change in surface area was more consistent across field strengths than was change in thickness ( Figure 8). This suggests that change in surface area may offer increased precision over change in thickness when analyzing scans of varying image quality. We also observed that, when comparing results obtained using a range of sample sizes, trends in surface area loss were more consistent across sample sizes than were trends in thickness loss (Figures 9 and 10). In this case, surface area loss offered increased statistical sensitivity over thickness loss when considering small samples. While volume loss, the combination of thickness and surface area loss, tended to be most sensitive across sample sizes (Figures 9, 10, and S6), it suffered from the loss of precision inherent in the thickness measure ( Figure S5).
The decreased precision associated with measures of change in thickness is not surprising, given that the thickness of the cortex ranges from 1 to 4.5 mm (Fischl & Dale, 2000), and MRI scans, including those used in this study (Jack et al., 2008), are often acquired with a target isotropic voxel size of 1 mm 3 . In these conditions, even minimal bias or measurement error can become problematic when calculating local change. To compound this problem, thickness maps calculated using surface-based methods may introduce bias (Scott, Bromiley, Thacker, Hutchinson, & Jackson, 2009), and precision is further compromised by the use of smoothing procedures (Bernal-Rusiel et al., 2010). These issues become particularly problematic when attempting to measure change within a single subject. By contrast, the surface area attributed to any vertex considered in this analysis is on the order of 20 mm 2 , and high-quality reconstruction of the cortical surface represents the original goal of popular reconstruction packages such as FreeSurfer.
This highlights a final benefit associated with maps of change in surface area generated by aMSM, which has been optimized to produce physically accurate surface deformations .
While additional work is needed to optimize the thickness dimension for atrophy mapping at the individual subject level, the results in this paper suggest that information gleaned from surface area alone is highly accurate and may be clinically useful to highlight unique regions undergoing cortical atrophy. This ability to track cortical changes in individual subjects offers a foundation to enable more personalized patient care and has the potential to aid in the implementation of precision medicine in AD detection and management.

| Limitations
We note that, while surface area loss, measured with aMSM, offers advantages over thickness loss in small sample sizes, sample size remains an important consideration in study design. In this study, about 20% of potential subjects considered for our primary 3.0 T analysis were excluded due to failed or poor-quality segmentation with FreeSurfer. This suggests that future analyses may benefit from consideration of alternate segmentation tools. Particularly in studies of Alzheimer's disease, tools such as MaCRUISE (Huo et al., 2016), which has demonstrated efficacy in older populations, may allow for a higher segmentation success rate. While the focus of this paper has been to establish the benefits of aMSM as a registration tool, studies of cortical atrophy may find additional advantage in the careful selection of a segmentation tool suited to the population of study.
We also acknowledge that multi-scanner, multi-site data can be complicated by unwanted scanner effects. While we have attempted to minimize these effects by using standardized datasets and balancing our diagnostic groups in terms of site and scanner manufacturer, we did not account for other differences in MRI protocol that may exist, which could confound physiologic trends. Furthermore, differences between 1.5 and 3.0 T results must be interpreted cautiously, since scanner type and manufacturer vary dramatically between these two datasets (Tables 2, 4, and S6-S9). Future studies using ADNI data or other multi-site datasets may benefit from the implementation of a more comprehensive data harmonization process, such as the ComBat (Johnson, Li, & Rabinovic, 2007) derived methods employed in some recent analyses of structural MRI scans (Beer et al., 2020;Pomponio et al., 2020). In this study, longitudinal measures for each individual (surface area loss, thickness loss, and volume loss) were calculated from scans collected on the same scanner, which may mitigate some scanner effects. However, trends within and between diagnostic groups may be expected to increase in accuracy with improved data harmonization. As such, data harmonization remains an important consideration and a direction for future study which cannot be overlooked.

| CONCLUSION
In summary, this study presents a viable method of accomplishing longitudinal registration and producing maps of cortical surface area loss.
With this approach, we identified atrophy at similar levels of significance using either maps of change in thickness or change in surface area. Notably, however, surface area maps did not require smoothing and thus avoided a source of lost precision present in our thicknessbased analysis. We illustrated that aMSM is more effective than existing methods at detecting surface area loss, producing improved maps of surface area loss that can benefit future studies seeking to elucidate the precise contributions of thickness and surface area to atrophy in AD and other neurodegenerative disorders. These maps are also a promising means of evaluating small subpopulations, due to their increased precision compared to maps of change in thickness or change in volume. Finally, this method offers the potential for improved mapping of atrophy in individual subjects, an advancement that could prove clinically useful in AD detection and management.