Age‐related change in task‐evoked amygdala—prefrontal circuitry: A multiverse approach with an accelerated longitudinal cohort aged 4–22 years

Abstract The amygdala and its connections with medial prefrontal cortex (mPFC) play central roles in the development of emotional processes. While several studies have suggested that this circuitry exhibits functional changes across the first two decades of life, findings have been mixed ‐ perhaps resulting from differences in analytic choices across studies. Here we used multiverse analyses to examine the robustness of task‐based amygdala—mPFC function findings to analytic choices within the context of an accelerated longitudinal design (4–22 years‐old; N = 98; 183 scans; 1–3 scans/participant). Participants recruited from the greater Los Angeles area completed an event‐related emotional face (fear, neutral) task. Parallel analyses varying in preprocessing and modeling choices found that age‐related change estimates for amygdala reactivity were more robust than task‐evoked amygdala—mPFC functional connectivity to varied analytical choices. Specification curves indicated evidence for age‐related decreases in amygdala reactivity to faces, though within‐participant changes in amygdala reactivity could not be differentiated from between‐participant differences. In contrast, amygdala—mPFC functional connectivity results varied across methods much more, and evidence for age‐related change in amygdala—mPFC connectivity was not consistent. Generalized psychophysiological interaction (gPPI) measurements of connectivity were especially sensitive to whether a deconvolution step was applied. Our findings demonstrate the importance of assessing the robustness of findings to analysis choices, although the age‐related changes in our current work cannot be overinterpreted given low test–retest reliability. Together, these findings highlight both the challenges in estimating developmental change in longitudinal cohorts and the value of multiverse approaches in developmental neuroimaging for assessing robustness of results.

While the small sample sizes examined in many studies on amygdala-mPFC development likely contribute to differences in findings (Marek et al., 2020), especially given low reliability of many amygdala-mPFC measures (Elliott et al., 2020;Herting, Gautam, Chen, Mezher, & Vetter, 2017;Sauder, Hajcak, Angstadt, & Phan, 2013), important methodological differences also exist across studies. Differences in age range or sample demographics, stimuli, task (e.g., passive viewing vs. emotion labeling or matching ;Lieberman et al., 2007), task design (blocked vs. event-related; Sergerie, Chochol, & Armony, 2008), or contrast used (faces > shapes vs. faces > baseline) may also contribute to discrepancies (see Table S1). The brain regions under investigation also differ across studies; for example, prefrontal clusters with which amygdala connectivity was assessed. Interpreting discrepancies across studies without appreciation for these methodological differences would be inappropriate, and in fact, incorrect. Yet, such differences do not account for all discrepancies in findings across studies. Variation in processing pipelines is another source of differences across studies, as varying analytic decisions can produce qualitatively different findings, even between putatively identical analyses of the same dataset (Botvinik-Nezer et al., 2020). Choices including software package (Bowring, Maumet, & Nichols, 2019), spatial smoothing (Jo et al., 2007), treatment of head motion (Achterberg & van der Meulen, 2019), parcellation (Bryce et al., 2021), and functional connectivity approach (Di, Zhang, & Biswal, 2020) can also impact results and qualitatively change findings (Cisler, Bush, & Steele, 2014). Additionally, the majority of developmental investigations of amygdala-mPFC function have studied cross-sectional samples. Because cross-sectional studies are susceptible to cohort effects and cannot measure within-participant change, longitudinal work has been recommended for better charting of developmental trajectories (Crone & Elzinga, 2015;Madhyastha et al., 2018).
Here, we used multiverse analyses to examine the robustness of developmental changes to varied analytical decisions. We focused on task-related amygdala-mPFC functional development in an accelerated longitudinal sample ranging from ages 4 to 22 years. We selected a task that was designed to be appropriate for young ages to characterize developmental change in amygdala-mPFC responses to fear and neutral faces across childhood and adolescence, and we asked whether findings were robust to analytical choices. This accelerated longitudinal design is an extension of the sample reported on by Gee et al. (2013).
We preregistered two hypotheses (https://osf.io/8nyj7/) predicting that both amygdala reactivity (1) and amygdala-mPFC connectivity (2) as measured with generalized psychophysiological interaction models (gPPI), would decrease as a function of age during viewing of fearful faces relative to baseline (see Table 1 Aims 1a and 2a).
Although we did not preregister further hypotheses, we also investigated age-related changes in within-scan differences in amygdala responses across trials and FC using beta series correlations. As T A B L E 1 Summary of main aims, hypotheses, methods, and findings Aim Preregistered hypothesis Analysis methodology Key findings 1a. Age-related change in amygdala reactivity to fear faces Amygdala reactivity to fearful faces will decrease with age, such that younger children will have more positive amygdala reactivity (higher BOLD response to fear faces relative to implicit baseline) than older youth.
Multiverse amygdala ROI (anatomically-defined) analysis using multilevel linear regression at the group level. • No evidence that amygdala reactivity, amygdala-mPFC connectivity, or change in amygdala reactivity across trials were associated with separation anxiety behaviors previous work identified associations between amygdala-mPFC FC and separation anxiety (Carpenter et al., 2015;Gee et al., 2013), we asked whether any amygdala-mPFC measures were associated longitudinally with separation anxiety behaviors (see Table 1 Aim 3). We used "multiverse" analyses and specification curves to examine the impact of analytical decisions on results. We also investigated testretest reliability of all brain measurements across longitudinal study visits, given the importance of such reliability for interpreting individual differences or developmental change (Herting et al., 2017). Our multiverse approach allows us to thoroughly explore the robustness of different findings to analytical choices, highlighting the importance of considering both robustness and reliability in developmental research.

| METHODS
Before completing analyses, we preregistered methods for the current study through the Open Science Framework at https://osf.io/8nyj7/.
Only analyses for age-related changes in amygdala reactivity and amygdala-mPFC gPPI were preregistered in detail (see Table 1 Aims 1a and 2a), and we did not preregister multiverse analyses. Methods detailed below include both information included in the preregistration and additional information and analyses not preregistered. Analysis code and documentation can be found at https://github.com/ pab2163/amygdala_mpfc_multiverse.

| Participants
Participants were recruited as part of a larger study examining brain development as a function of early life caregiving experiences. The current sample (N = 98; 55 female, 43 male) included typically developing children, adolescents, and young adults covering the ages 4-22 years-old (M = 11.9 years old) who enrolled to participate in a study on emotional development. All participants were reported to be physically and psychiatrically healthy (no medical or psychiatric disorders), as indicated by a telephone screening before participation, and free of MRI contraindications. All except four participants fell below clinical cutoffs (see Figure S2) on the Child Behavior Checklist (CBCL) Total Problems, Internalizing Problems, and Externalizing Problems scales (Achenbach, 1991 An accelerated longitudinal design was used such that participants' starting ages at scan 1 comprised the entire range of sample ages (4-22 years old), and coverage was approximately balanced across the entire age range (see Figure 1b). The design was structured into three study "waves" corresponding with recruitment efforts and visit protocols. Participants were overenrolled at Wave 1 to account for anticipated attrition (e.g., braces, relocation, etc.) to achieve the desired sample size across the three waves. While most participants were recruited such that their first scan occurred at Wave 1, a smaller set of participants were recruited at Wave 2, such that some participants completed their first scan at Wave 2 (see Figure 1). For these participants, only two scans were planned.
Of the 191 participants participating in the longitudinal study, 140 completed at least one MRI scan. After exclusions for incomplete task runs (including falling asleep), computer errors resulting in missing stimulus timing files, high head motion, and failed visual QA (scanner/ motion artifacts), a final sample of 98 participants (N = 183 total scans) was included for analysis (see Figure 1). Exclusion criteria were preregistered after conducting preliminary preprocessing, but before construction of group-level models and multiverse analysis plans. This sample included 40 participants with 1 scan (including 10 participants ages 18-22 not enrolled for additional scan sessions), 31 with 2 scans, and 27 with 3 scans (one more participant than preregistered due to an initial coding error). Wave 1 data from 42 of these participants were reported on by Gee et al. (2013).

| Separation anxiety
For each participant (except for 10 adults 18-22 years), a parent completed both the Revised Children's Anxiety and Depression Scale (RCADS-P) and the Screen for Child Anxiety Related Emotional Disorders (SCARED-P) to assess the frequency of symptoms of anxiety and low mood (Birmaher et al., 1999;Chorpita, Yim, Moffitt, Umemoto, & Francis, 2000). Following prior work suggesting associations between task-evoked amygdala-mPFC FC and separation anxiety (Carpenter et al., 2015;Gee et al., 2013), we used the separation anxiety subscales from both the SCARED-P and RCADS-P as measures of anxiety-related behaviors in asking whether such FC may be linked to anxiety levels during childhood and adolescence. For 11 participants who had missing items on the SCARED-P, indicating parents had skipped or forgotten to answer a question, we imputed responses using 5-Nearest Neighbor imputation using only the other items included in the SCARED-P separation anxiety subscale (Beretta & Santaniello, 2016). As expected, raw separation anxiety scores on both measures decreased as a function of age, while standardized scores (which are normed based on gender and grade level) were consistent across development with few children at or near clinical threshold (see Figure 6).

| Emotion discrimination task
Participants completed either two (at Wave 3) or three (at Waves 1 and 2) runs of a modified "go/no-go" task with emotional faces during fMRI scanning. Runs varied by emotional expression (fear, happy, sad), and within each run participants viewed emotional faces interspersed with neutral faces. To ensure that participants were paying attention, they were asked to press a button whenever they saw a neutral face (no response was required for any other face expression).
The order of the runs was counterbalanced across participants; the stimuli within each run were pseudorandomized (Wager & Nichols, 2003) to allow for event-related estimates of the hemodynamic response, and fixed across participants. For the present analysis, only the fear run of the task was used. The other two runs, which used happy and sad faces in place of fear, are not included in the present analysis as these conditions were not present at all waves of data collection. As 50% of trials were "go" trials under this paradigm, we refer to the task as an emotion discrimination task, rather than a true "go/no-go" paradigm since there was no strong prepotent motor response. Stimuli within each run were presented with a jittered ITI (3-10 s, median = 4.93 s) according to a genetic algorithm with a fixation cross on the screen (Wager & Nichols, 2003). Face images were adult White female faces from the Karolinska Directed Emotional Faces database (Calvo & Lundqvist, 2008), and the same face stimuli were used across longitudinal study visits (Vijayakumar et al., 2019).
Each run (130 TRs, duration of 4:20) consisted of 48 trials (24 neutral faces, 24 fearful faces), each presented for 350 ms. All fMRI analyses of this task used event-related designs. Wave 3 was collected on a Siemens 3T TIM Trio MRI scanner at a different location using identical acquisition parameters.

| Behavioral analyses
We used multilevel logistic regression models to estimate age-related changes in several task performance metrics. We fit separate models for the d 0 performance metric, overall accuracy (probability of a correct response on any trial), hit rate (on neutral face trials), and false alarm rate (on fear face trials) as the respective outcomes, and included nested random effects for task sessions within participants (models were not nested for d 0 as this analysis used only 1 metric per session rather than trial-wise outcomes, but still included random effects for participants). Additionally, to model age-related change in reaction times during correct hit trials, we fit linear, quadratic, cubic,

| Preregistered fMRI pipeline
3dskullstrip from the Analysis of Functional NeuroImages (AFNI, v20.1.16) software package (Cox, 1996)  FNIRT (nonlinear registration) with 12 DOF and a warp resolution of 10 mm. Data were high-pass filtered at 0.01 Hz and smoothed with an isotropic Gaussian kernel with FWHM of 6 mm before running general linear models (GLMs), and four-dimensional volumes were grand mean scaled such that the average intensity value was 10,000.
Following preprocessing, we ran scan-level GLMs using FSL's FEAT (v6.00). We included event-related regressors for fear and neutral faces (convolved with a double-gamma HRF), their temporal derivatives (Pernet, 2014), and 24 head motion nuisance regressors (the 6 head realignment parameters, their temporal derivatives, and their squares (Power et al., 2012). Volumes with FD >0.9 mm were downweighed to 0 in the GLM. Pre-whitening was used to estimate and remove temporal autocorrelation (Woolrich, Ripley, Brady, & Smith, 2001). For each scan, we calculated fear > baseline, neutral > baseline, and fear > neutral contrasts. We used native-space bilateral amygdala masks generated using Freesurfer (v6.0;

| Multiverse analyses and specification curves
In addition to the preregistered pipelines, we conducted multiverse analyses to address all aims in Table 1 and constructed sets of separate specification curves for each aim (see Table 2). In general, multiverse analyses aim to probe the consistency of results across all "reasonable" possible combinations of analysis decisions (i.e., simultaneously taking all possible "forking paths"; Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016). Because analyzing fMRI data using all reasonable specifications was infeasible (i.e., possibilities are virtually infinite), we took the approach of "sampling" from the many reasonable or commonly-used analysis choices for each multiverse. Despite not being completely comprehensive, this approach still allowed for thorough investigation into the robustness of results. For all multiverse analyses, we constructed specification curves by ranking models by their beta estimates (

| Multiverse amygdala reactivity analyses
For amygdala reactivity analyses, we examined the robustness of agerelated change estimates to a variety of analytical decisions. In addition to the preregistered FSL-based pipeline, we preprocessed data using C-PAC software (v1.4.1; Craddock et al., 2013). We used C-PAC to take advantage of features supporting running multiple pipeline "forks" in parallel (for example multiple nuisance regression forks using the same registration). No spatial smoothing was used in C-PAC pipelines (see Supporting Information methods p. 14). Following C-PAC and FSL preprocessing, we examined the impact of different sets of commonly-used analysis methods on age-related change in amygdala reactivity. We varied analysis choices of GLM software, hemodynamic response function, nuisance regressors, first-level GLM estimates, amygdala ROI, exclusion criteria (exclude vs. include scans analyzed by Gee et al., 2013), group-level model outlier treatment, and group-level model covariates (see Table 2 and Supporting Information methods pp. 14-17). Multiverse analyses of amygdala reactivity included a total of 2,808 model specifications (156 ways of defining participant-level amygdala reactivity Â 18 group-level model specifications) for each contrast. We analyzed all specifications in parallel. In addition, we examined nonlinear age-related changes using quadratic and inverse age models (see Figures S14-S17) and ran a smaller set of analyses ( Figure S19) to ask whether we could differentiate within-participant change over time from between-participant For all specifications, individual-level amygdala reactivity estimates were submitted to a group-level multilevel regression model for estimation of age-related changes. All models allowed intercepts to vary by participant, and some specifications also allowed for varying slopes (see Supporting Information methods p. 15 for model syntax).
All models also included a scan-level covariate for head motion (mean FD; Power et al., 2012;Satterthwaite et al., 2012Satterthwaite et al., , 2013. Consistent with prior work, head motion was higher on average in younger children, and decreased with age (see Figure S5), though head motion was not associated with amygdala reactivity estimates for most specifications (see Figure S26). Age-related change findings examined for the preregistered pipeline also remained consistent under more stringent exclusion thresholds based on mean FD (see Figure S27). Across preprocessing specifications, we also examined within-scan similarity of amygdala and whole-brain voxelwise reactivity patterns (see

| Change in amygdala reactivity across trials
To probe whether amygdala reactivity exhibited within-scan change in an age-dependent manner, we modeled reactivity to each face trial using a least squares separate method (LSS; Abdulrahman & Henson, 2016). After preprocessing, we used FEAT to fit 48 separate GLMs corresponding to each trial in each scan. A given trial was modeled with its own regressor and the remaining 47 trials were modeled with a single regressor. Each GLM also included 24 head motion nuisance regressors and had TRs with FD >0.9 mm downweighted to 0. BOLD data were high-pass filtered at 0.01 Hz before the GLM. From each of the 48 GLMs, we extracted the mean amygdala beta estimates corresponding to a contrast for each single trial > baseline.
We constructed separate multiverse analyses using three different methods for measuring change in amygdala reactivity across trials. For Method 1 (slopes), we measured rank-order correlations between trial number and trial-wise amygdala betas. For Method 2 (trial halves), we split trials into the first half (Trials 1-12) and second half (Trials 13-24), and modeled age-related change in each half. For Method 3 (single-trial models), we constructed larger multilevel models with individual trials as the unit of observation. We conducted several analysis specifications for each method (see Table 2 and Supporting Information methods pp. 21-23), and generated corresponding specification curves.

| Multiverse amygdala-mPFC FC analyses
We applied multiverse analysis techniques toward examining agerelated changes in amygdala-mPFC FC using gPPI and beta-series correlation (BSC) methods. Briefly, gPPI estimates FC by constructing an interaction term between the timecourse in a seed region of interest and a stimulus (task) regressor. Voxels whose activity are well fit by this interaction term (a psychological-physiological interaction, or PPI) are assumed to be "functionally coupled" with the seed region in a way that depends on the behavioral task (McLaren, Ries, Xu, & Johnson, 2012;O'Reilly, Woolrich, Behrens, Smith, & Johansen-Berg, 2012). BSC offers a different way of estimating functioning connectivity, by constructing "timeseries" of beta values (i.e., a beta series) in a condition of interest for two regions of interest, and calculating the product-moment correlation between those beta series.
We constructed separate specification curves for age-related change in gPPI and BSC for each contrast. Across gPPI specifications, we varied whether to use a deconvolution step in creating interaction regressors Gitelman, Penny, Ashburner, & Friston, 2003), as well as several other analysis decision points (see Table 2 and Supporting Information methods pp. 24-25). The deconvolution step applies to the preprocessed BOLD data from the seed timecourse: these data are first deconvolved to estimate the "underlying neural activity" that produced the BOLD signal (Gitelman et al., 2003), then these deconvolved signals are multiplied with the task regressor (e.g., for fear faces). Finally, this new interaction term is convolved with a hemodynamic response function to produce the BOLD FC regressor of interest. Given recent work indicating that centering the task regressor before creation of the interaction term can mitigate spurious effects (Di, Reynolds, & Biswal, 2017), we also compared pipelines in which we centered the task regressor before deconvolution (pipelines including deconvolution in main analyses did not include this step; see Figure S44).
We preregistered constructing an mPFC ROI containing 120 voxels centered at the peak coordinates reported by Gee et al. (2013) for age-related change in fear > baseline gPPI (Talairach 2,32,8;or MNI 3,35,8). However, after preregistration we discovered that these peak coordinates were not at the center of the ROI reported by Gee et al. (2013), and were quite close to the corpus callosum. The 120-voxel ROI we created that was centered at this peak coordinate would have contained a high proportion of white matter voxels relative to cortical voxels (though this was not true for the mPFC ROI identified by Gee et al. (2013)). To address this issue, we instead constructed three spherical ROIs with 5 mm radii; the first centered at the above peak coordinates, the second shifted slightly anterior, and the third shifted slightly ventral relative to the second (see Figure 4). Lastly, to examine amygdala FC with a more broadlydefined mPFC, we also used a "large vmPFC" mask encompassing many of the areas within the ventromedial prefrontal cortex derived from Mackey and Petrides (2014).
For BSC analyses, we used beta estimates from the LSS GLMs described above for analyses of within-scan change in amygdala reactivity. Across BSC specifications we varied analyses across several decision points (see Table 2 and Supporting Information methods p. 26), including whether to include a correction for global signal (post hoc distribution centering [Fox, Zhang, Snyder, & Raichle, 2009]). We extracted mean beta estimates for amygdala and mPFC ROIs for each trial, then calculated product-moment correlations between the timeseries of betas across trials (neutral and fear separately) for both regions (Di et al., 2020). These correlation coefficients were transformed to z-scores, then submitted to group-level models.
Age-related changes in gPPI and BSC were estimated using multilevel regression models as described for the amygdala reactivity analyses. We focused primarily on linear age-related change, but also examined quadratic and inverse age associations (see Figures S45-S48 and S58-S61). We separately examined group mean gPPI and BSC for each contrast (see Figures S38 and S52), as well as associations between mean FD and both FC measures across specifications (see Figures S49 and S62). Additionally, we examined mean estimates and age-related change in "task-independent" FC as measured by beta weight of the "physio" term from the seed amygdala timeseries within the gPPI model (representing baseline amygdala-mPFC FC controlling for task-induced variance; Figures S50 and S51).

| Multiverse analyses of associations between amygdala and mPFC circuitry and separation anxiety behaviors
We used further multiverse analyses to ask whether amygdala reactivity, change in amygdala reactivity over the course of the task, or amygdala-mPFC FC were associated with separation anxiety behaviors. Separate specification curves were created for each brain measure type (amygdala reactivity, amygdala reactivity change across trials, and amygdala-mPFC FC). All analyses used multilevel regression models with covariates for age, and specification curves included both RCADS-P and SCARED-P separation anxiety subscales as outcomes (see Table 2 and Supporting Information methods pp. 29-30).
Because we did not have parent-reported RCADS-P or SCARED-P scores for 10 adult participants, these analyses had an N = 173.

| Model-fitting
All statistical models fit at the group level were run in the R (v 3.6.1) computing environment. In order to most accurately model agerelated changes in each of our measures, we attempted to take into account both between-participants information and repeated mea-

| Interactive visualizations
Because static plots visualizing the model predictions for all models in each multiverse would require far more page space than available, we created web-based interactive visualization tools for exploring different model specifications and viewing the corresponding raw (participant-level) data and fitted model predictions using R and Shiny (Beeley, 2013). These visualizations can be found at https://pbloom. shinyapps.io/amygdala_mpfc_multiverse/

| Deviations from preregistration
Although we largely completed the preregistered analyses, the current study includes many analyses beyond those proposed in the initial preregistration. Because the additional analyses (i.e., all multiverses) conducted here give us substantial analytical flexibility over that initially indicated by preregistration, we consider all results here to be at least in part exploratory (rather than completely confirmatory), despite the preregistered hypotheses. Additionally, we note that BSC analyses, analyses of change in amygdala reactivity across trials, and analyses of associations between all brain measures and separation anxiety were exploratory, and conducted after we had seen the results of the preregistered reactivity and gPPI analyses. In addition, to avoid possible selection bias introduced by the analytical flexibility inherent in running many parallel analyses, we consider all analysis specifications simultaneously, emphasizing that without further methodological work, we consider all such choices in tandem as providing equal evidential value. While reliability analyses were not preregistered, they too provide key information for interpreting the current analyses.

| Age-related change in amygdala reactivity
We used multilevel regression models and specification curve analyses to examine age-related changes in amygdala reactivity to faces in an accelerated longitudinal sample ranging from ages 4 to 22 years ( Figure 2). Across specifications, we found relatively consistent evidence for negative age-related change in anatomically-defined (Harvard-Oxford atlas and Freesurfer-defined) amygdala reactivity to fear faces > baseline, such that the vast majority of analysis specifications (99.6%) estimated linear slopes at the group level that were negative in sign, and the majority (60.0%) of 95% posterior intervals about these slopes excluded 0 (Figure 2a; interactive version at https://pbloom.shinyapps.io/amygdala_mpfc_multiverse/). Thus, over half of models indicated that on average, increases in age were associated with decreases in amygdala reactivity to fear faces > baseline.
Because the Wave 1 data in the current study included the 42 scans used by Gee et al. (2013)  when 42 previously analyzed scans (ages 4-17 years) were excluded to provide stricter independence from previously analyzed data (see Figure S11, Gee et al., 2013). Estimated average age-related change for the fear > baseline contrast was somewhat stronger when using a right amygdala ROI compared to the left amygdala, and when using t-F I G U R E 2 Multiverse analyses of age-related change in amygdala reactivity. (a). Specification curve of age-related change in fear > baseline amygdala reactivity. Points represent estimated linear age-related change and lines are corresponding 95% posterior intervals (PIs). Models are ordered by age-related change estimates, with the dotted line representing the median estimate across all specifications. Color indicates sign of beta estimates and whether respective posterior intervals include 0 (red = negative excluding 0; blue = negative including 0, green = positive including 0, black = median across all specifications). (b). Model specification information corresponding to each model in A. Variables on the yaxis represent analysis choices, corresponding color-coded marks indicate that a choice was made, and blank space indicates that the choice was not made in a given analysis. Within each category panel (amygdala ROI, Group-Level Model, and Participant-Level Model), decision points are ordered from top to bottom by the median model rank when the corresponding choice is made (i.e., choices at the top of each panel tend to have more negative age-related change estimates). Black points with error bars represent the median and IQR ranks of specifications making the choice indicated on the corresponding line. (c). Example participant-level data and model predictions for age-related related change in amygdala reactivity for both the fear > baseline (green) and neutral-baseline (orange) contrasts. Data are shown for a preregistered pipeline using a native space bilateral amygdala mask, 24 motion regressors, t-statistics, high-pass filtering, and participant-level GLMs in FSL. Points represent participant-level estimates, light lines connect estimates from participants with multiple study visits, and dark lines with shaded area represent model predictions and 95% posterior intervals. (d). Specification curves for a subset of models separately parametrizing within-participant (right) vs. between-participant (left) age-related change for both the fear > baseline (green) and neutral > baseline (orange) contrasts, as well as the median across specifications (black). See https://pbloom.shinyapps.io/amygdala_mpfc_multiverse/ for interactive visualizations stats extracted from scan-level GLMs rather than beta estimates for group-level models (see Figure S11).
Parallel multiverse analyses found similarly consistent age-related decreases in neutral faces > baseline amygdala reactivity (see Figure 2c for an example pipeline and Figure S9 for specification curve), but no consistent evidence for age-related change for the fear > neutral contrast (see Figure S10). However, there was consistent evidence for higher reactivity for fear faces > neutral on average as well as each emotion compared to baseline ( Figures S6-S8), indicating that while the amygdala responses were robust and generally stronger for fear faces compared to neutral, such fear > neutral differences did not change with age. Across contrasts, varying the inclusion of block order or scanner covariates, inclusion of random intercepts, and use of robust regression models had little impact on age-related change estimates (see Figures 2b and S6-S8).
While group-level estimates of average age-related change were relatively consistent across specifications, the estimated age terms in these models could be influenced by both within-participant change and between-participant differences (King et al., 2018;Madhyastha et al., 2018). A smaller separate specification curve indicated that when models were parametrized to differentiate within-participant change and between-participant differences, average withinparticipant change was not consistent across specifications and could not be estimated with precision ( Figure 2d). In contrast, estimates of between-participant differences largely indicated negative age-related differences in concurrence with our initial model parametrization. At the same time, within-participant versus between-participant terms were not reliably different from one another, indicating that models could not distinguish them despite higher precision for estimating between-participant differences (see Figure S19). We did not find consistent evidence for quadratic age-related changes in amygdala reactivity (see Figures S14-S17). Inverse age models (i.e., amygdala reactivity modeled as a function of 1/age) indicated results similar to those of linear and quadratic models with most specifications for the fear > baseline and neutral > baseline (though less consistent) contrasts indicating age-related decreases (see Figure S18).

| Age-related differences in within-scan amygdala reactivity change
To ask whether age-related changes in amygdala reactivity could be due to developmental changes in patterns of amygdala reactivity across face trials (within a run), we examined whether within-scan change in amygdala reactivity varied with age (see Table 1 Aim 1b).
Across both fear and neutral trials, linear slopes of amygdala reactivity were negative on average, indicating higher amygdala reactivity at the beginning of the run (Figure 3a and Figure S30). Across specifications, for both fear (100% of estimates had the same sign, 95.2% of posterior intervals excluding 0 in the same direction) and neutral trials (100% of estimates in the same direction, 38.1% of posterior intervals excluding 0), there was evidence that these within-scan slopes were steeper (i.e., more negative) at younger ages, though evidence was relatively weaker for neutral trials (Figure 3d,e). Specifications with a global signal subtraction step also tended to find stronger age-related change.
Similarly, when splitting trials into the first half (Trials 1-12) versus second half (Trials 13-24), there was consistent evidence (100% of estimates had the same sign, 69.2% with posterior interval excluding 0) for an interaction between age and trial half, such that average reactivity to fear faces > baseline in the first half of trials decreased as a function of age more so than did average reactivity during the second half of trials (see Figures 3b and Figure S32). This interaction was in the same direction for neutral trials across most specifications (88.5% of estimates), but was typically not as strong (3.8% of posterior intervals excluding 0). Single-trial models indicated similar age-related change in within-scan amygdala dynamics (see Figure 3c and S33 and S34). Mean group-level amygdala reactivity was higher for the first half of trials for fear faces > baseline across several specifications, though there were not consistent differences between trial halves for mean amygdala reactivity to neutral faces ( Figure S31).

| Age-related change in task-evoked amygdala-mPFC functional connectivity
We used multilevel regression modeling and specification curve analyses to examine age-related change in task-evoked amygdala-mPFC FC within the accelerated longitudinal cohort (see Table 1  ROIs is presented in Figure 4b. While mPFC ROI definition and other analysis decision points also influenced estimates of age-related change in gPPI (Figure 4d), follow-up regression models indicated that the effect of including the deconvolution step was several times larger for the fear > baseline contrast (see Figures S41-S43).
Through equivalent multiverse analyses we also found no evidence of consistent linear age-related change in amygdala-mPFC gPPI for the neutral > baseline and fear > neutral contrasts (see Figures S39 and S40), or nonlinear change for any contrast (see Figures S45-S48). In addition, we did not see consistent evidence for group average amygdala-mPFC gPPI for any contrast, though such results often differed as a function of whether a deconvolution step was included (see Figure S38). Though we included gPPI analysis specifications excluding the 42 scans at timepoint 1 studied by Gee et al. (2013), exclusion of these scans had little impact on age-related change results (see Figure 4d).
In addition to gPPI analyses, we used BSC analyses to examine age-related changes in task-evoked amygdala-mPFC connectivity (see Table 1 Aim 2b). As with gPPI, multiverse analyses of amygdala-mPFC BSC (168 total specifications; 3 amygdala ROI definitions Â 4 mPFC ROI definitions Â 2 global signal options Â 7 group-level models) for fear trials (vs. baseline) did not yield strong evidence of age-related change across pipelines (84.5% of point estimates in the same direction, 24.4% of posterior intervals F I G U R E 3 Age-related change in amygdala reactivity across trials. (a). An example model of estimated age-related change in slopes of beta estimates across both fear (green) and neutral (orange) trials. Negative slopes represent higher amygdala activity in earlier trials relative to later trials. (b). Example models of estimated age-related change in amygdala reactivity for the fear > baseline (left) and neutral > baseline (right) contrasts for both the first (red) and second (blue) halves of trials. In both a and b, points represent participant-level estimates, light lines connect estimates from participants with multiple study visits, and dark lines with shaded area represent model predictions and 95% posterior intervals. (c). Example single-trial model predictions of estimated amygdala reactivity for fear (left) and neutral (right) faces as a function of age and trial number. Age was modeled as a continuous variable, and average predictions for participants of age 6 (red), 12 (green), and 18 (blue) years are shown for visualization purposes. All estimates in a-c shown are from an example analysis pipeline using bilateral amygdala estimates and without global signal correction. (d). Specification curve for age-related change in slopes across fear trials (i.e., many parallel analyses for the fear trials in subplot b). (e). Specification curve for age-related change in slopes across neutral trials (i.e., neutral trials in plot b). GSS = global signal correction using post hoc mean centering. For both d and e, color indicates sign of beta estimates and whether respective posterior intervals include 0 (green = positive including 0, purple = positive excluding 0, and black = median across all specifications), and horizontal dotted lines represent median estimates across all analysis decisions. Variables on the y-axis represent analysis choices, corresponding color-coded marks indicate that a choice was made, and blank space indicates that the choice was not made in a given analysis excluding 0; Figure 5a, interactive version at https://pbloom.
shinyapps.io/amygdala_mpfc_multiverse). Unlike gPPI analyses, however, choice of mPFC ROI (as well as amygdala ROI, though this was not examined for gPPI) most impacted age-related change in BSC estimates, rather than preprocessing or modeling analytical choices (Figures 5b and S55-S57). Accordingly, while global signal subtraction resulted in weaker amygdala-mPFC BSC on average (see Figure S52), inclusion of this step did not consistently affect age-related change estimates (Figure 4c). We did not find consistent evidence for age-related change in amygdala-mPFC BSC for neutral trials (vs. baseline), or for fear > neutral trials ( Figures S53 and S54).
We did not find consistent evidence for nonlinear age-related change for any contrast (Figures S58-S61).
Additionally, we constructed a correlation matrix using rank-order correlations of scan-level BSC and gPPI estimates for the fear (vs. baseline) condition. Across scans, there was little evidence of correspondence between BSC and gPPI metrics for amygdala-mPFC connectivity (Figures 5d and S63-S66). Further, FC estimates tended to be positively correlated within a method type (BSC, gPPI) across mPFC ROIs, though less strongly for gPPI estimates with versus without a deconvolution step.
In addition to gPPI and BSC methods for FC, we also explored between-scan associations between amygdala reactivity and mPFC reactivity ( Figures S28 and S29). Multilevel models indicated that amygdala reactivity for fear faces > baseline was positively associated with mPFC reactivity for fear faces > baseline for all mPFC ROIs, though we did not find consistent evidence for age-related changes in associations between amygdala and mPFC reactivity to fear faces > baseline (see Figure S29).

| Amygdala-mPFC measures and separation anxiety
We conducted multiverse analyses of associations between several amygdala-mPFC measures (amygdala reactivity, amygdala-mPFC FC, within-scan changes in amygdala reactivity) and separation anxiety behaviors (see Table 1 Aim 3). Separation anxiety behaviors on excluding 0), found consistent evidence for associations between brain measures and separation anxiety. Similar specification curves found little consistent evidence for associations between brain measures and generalized anxiety, social anxiety, or total anxiety behaviors (see Figure S67). All specifications controlled for age (see Supporting Information methods p. 30).
To more specifically follow up on previous work reporting associations between separation anxiety behaviors and amygdala-mPFC gPPI for fear > baseline specifically (Gee et al., 2013), we plotted model predictions for such models from the above multiverse analysis for each of the four mPFC ROIs, across all three separation anxiety outcome measures, and both with and without a deconvolution step ( Figure 6e). We did not find consistent evidence for associations with separation anxiety, and results showed high sensitivity to the deconvolution step, mPFC ROI, and outcome measure used.

| Reliability
To examine test-retest reliability estimates of amygdala-mPFC measures across longitudinal visits, we computed Bayesian ICC estimates using a variance decomposition method (Lüdecke et al., 2021).
Because such models can accommodate missing data, all observations

| DISCUSSION
Measures that are both robust to researcher decisions and reliable across measurement instances are critical for studies of the human brain (Botvinik-Nezer et al., 2020;Bowring et al., 2019;Elliott et al., 2020;Xu et al., 2022). The accelerated longitudinal design and multiverse analysis approach used in the current study allowed a rare opportunity to examine both reliability and robustness of amygdala-mPFC measures using a rapid event-related face task from early childhood through young adulthood. Overall, estimates for age-related change in amygdala reactivity were relatively robust to a variety of analytical decision points, while age-related change estimates for amygdala-mPFC connectivity were more sensitive to researcher choices. gPPI analyses were particularly sensitive to whether a deconvolution step was applied. Yet, in concurrence with previous work (Elliott et al., 2020;Haller et al., 2022;Herting et al., 2017;Infantolino, Luking, Sauder, Curtin, & Hajcak, 2018;Kennedy et al., 2021;Nord, Gray, Charpentier, Robinson, & Roiser, 2017;Sauder et al., 2013), amygdala-mPFC measures displayed consistently poor test-retest reliability across many analytical specifications.
While low reliability estimates in the present study may be due in part to the long ($18 months) test-retest interval (Elliott et al., 2020) and potential true developmental change (Herting et al., 2017), low reliability nevertheless imposes a major caveat toward interpretation of the current developmental findings.
The present findings are valuable from a methodological standpoint in evaluating the robustness of analytical tools used. A measurement can have high test-retest reliability yet low robustness (high sensitivity) to analytical decisions, or vice versa (Li et al., 2021).
Because neither robustness nor reliability guarantee the other, current findings on the impacts of analytic choices will likely be informative in F I G U R E 6 Multiverse analyses of associations between amygdala-mPFC circuitry and separation anxiety. (a). Age-related change in SCARED and RCADS raw and t-scores for parent-reported separation anxiety subscales. The red dotted line in the middle panel represents the clinical threshold for the standardized RCADS measure (because this t-score measure is standardized based on age and gender, no age-related change is expected). (b). Separate specification curves for associations of amygdala reactivity (left), amygdala-mPFC connectivity (both gPPI and BSC; center two panels), and amygdala reactivity slopes across trials (right) with the three separation anxiety outcomes shown in a. Points represent estimated associations between brain measures and separation anxiety (controlling for mean FD and age) and lines are corresponding 95% posterior intervals. Models are ordered by beta estimates, and the dotted line represents the median estimate across all specifications. Color indicates sign of beta estimates and whether respective posterior intervals include 0 (red = negative excluding 0, blue = negative including 0, green = positive including 0). Scores on each separation anxiety outcome were z-scored for comparison. (c). Example model predictions for associations between fear > baseline amygdala-mPFC gPPI and each separation anxiety measure. Predictions and 95% posterior intervals are plotted for each separation anxiety measure separately for each mPFC region, and for gPPI pipelines with and without a deconvolution step. Pipelines shown use robust regression, have random slopes, no covariates for task block or scanner, and no quadratic age term guiding future studies. Thus, we discuss each of the main analyses below, with particular emphasis on how findings are impacted by analytic choices.

| Amygdala reactivity
While there were differences across model specifications, the majority of pipelines supported our hypothesis that amygdala reactivity to fearful faces decreases with age from early childhood through early adulthood (see Table 1 Aim 1a). Across specifications, we found relatively robust evidence for age-related decreases in amygdala reactivity to both fearful and neutral faces (Figure 2a). Yet, findings also varied considerably across specifications. For example, only 60% of pipelines produced results that would be individually labeled as "significant" (under α = .05), indicating that multiple investigations of this dataset could likely lead to qualitatively different conclusions. While over half of analyses found evidence consistent with studies indicating greater amygdala reactivity to fear faces > baseline in younger children (Forbes et al., 2011;Gee et al., 2013;Guyer et al., 2008;Swartz et al., 2014), the other 40% of specifications would have been consistent with investigations that found little age-related change (NB: there were also differences in samples, age ranges, task parameters, and behavioral demands across these studies; Kujawa et al., 2016;Wu et al., 2016;Zhang et al., 2019). We also found that different specifications resulted in somewhat different nonlinear trajectories (see F I G U R E 7 Longitudinal test-retest Bayesian ICC estimates. ICC values are shown for amygdala reactivity (a), slopes of amygdala reactivity betas across trials (b), amygdala-mPFC functional connectivity using both gPPI and BSC methods (c), and separation anxiety and in-scanner head motion measurements (d). Shaded background colors depict whether ICC estimates are categorized as poor (<.4), fair (.4-.6), or good (.6-.75) reliability. No ICC estimates met the threshold for excellent reliability (>.75). Bayesian ICC estimates were calculated through a variance decomposition based on posterior predictive distributions. Negative values indicate higher posterior predictive variances not conditioned on random effect terms than conditioned on random effects terms Figures S14-S18). Not only did inverse age and quadratic age models find different trajectories (as would be expected), but quadratic trajectories themselves also displayed considerable analytic variability, with some specifications finding "convex" and others finding "concave" fits (see Figure S17). Although estimating nonlinear age-related change was not a primary goal of the present study, future work should use model comparisons for better differentiating nonlinear patterns (Curran, Obeidat, & Losardo, 2010;Luna et al., 2021).
Models also found evidence for between-participant differences, but could neither identify within-participant change (Figure 2d) nor differentiate between-participant from within-participant estimates.
As such, interpretation of the age-related change reported here is subject to many of the same limitations that apply to cross-sectional designs (Glenn, 2003) Age-related change in amygdala responses to fear faces over baseline seemed largely the result of earlier trials in the task (see Figures S32-S34). While differences in task design and contrast across studies have been highlighted as potential sources of discrepant findings on the development of amygdala function (Killgore & Yurgelun-Todd, 2007;Lieberman et al., 2007;Swartz et al., 2014), this result indicates that attention to trial structure and task duration may also be necessary in comparing studies. Because the paradigm used in the current study involved a task requiring participants to press for one face ("neutral") and not press for "fear" faces, findings specific to fear faces over baseline under the current paradigm may also be driven by behavioral task demands.

| Amygdala-mPFC functional connectivity
We did not find evidence for our second hypothesis, as neither gPPI nor BSC analyses indicated consistent evidence of age-related change in amygdala-mPFC FC (see Table 1 Aims 2a,b, Figures 4 and 5). Thus, the age-related changes in task-evoked amygdala-mPFC connectivity identified in prior work (Gee et al., 2013;Kujawa et al., 2016;Wu et al., 2016) were not identified here, consistent with (Zhang et al., 2019). Crucially, however, our specification curves did not find strong evidence against such age-related change, as we did not observe precise and consistent "null" estimates across specifications. Additionally, quadratic and inverse age models did not find consistent evidence for nonlinear age-related change (see Figures S45-S48 and S58-S61).
gPPI results were sensitive to whether a deconvolution step had been included in the preprocessing pipeline, such that we mostly found age-related decreases in amygdala-mPFC connectivity with a deconvolution step included, and age-related increases without it (although most pipelines would not have been "statistically significant" on their own, see Figure 4b). While deconvolution has been argued to be a necessary step for event-related PPI analyses (Gitelman et al., 2003), recent work has shifted guidelines on its use, and it may not be recommended for block designs (Di et al., 2020;. Because the true "neuronal" signal underlying the BOLD timeseries within a given ROI cannot be directly measured, deconvolution algorithms are difficult to validate. Further, deconvolution may cause PPI results to be driven by baseline connectivity if task regressors are not centered , although such centering did not have a major influence on age-related change results in the present analyses (see Figure S44). Within the current study, small tweaks to AFNI's 3dTfitter algorithm for deconvolution resulted in vastly different regressors (see Figure S36), suggesting the potential for high analytic variability even between gPPI analyses ostensibly using deconvolution. While the present study does not pro-  (Gratton et al., 2020;Power et al., 2019). Supporting this, BSC estimates were correlated with mean FD across scans for the fear > baseline and neutral > baseline contrasts only when a global signal correction was not applied (see Figure S62). In addition, while test-retest reliability for all BSC measures was poor, BSC estimates from pipelines including a global signal correction step mostly demonstrated somewhat higher ICC (Figure 6). While these results are consistent with prior work indicating that correcting for the global signal can mitigate artifacts (Ciric et al., 2017;Satterthwaite et al., 2012), other work indicates that such corrections also remove meaningful biological signals (Belloy et al., 2018;Glasser et al., 2018;Yousefi, Shin, Schumacher, & Keilholz, 2018).

| Amygdala-mPFC circuitry and separation anxiety
We did not find associations between any task-related amygdala-mPFC measures (reactivity or FC) and separation anxiety behaviors (see Table 1 Aim 3; Figure 6). This finding stands in contrast to associations between amygdala-mPFC connectivity and anxiety identified in previous developmental work (Gee et al., 2013;Jalbrzikowski et al., 2017;Kujawa et al., 2016;Qin et al., 2014). However, given that analyses of brain-behavior associations may require imaging cohorts much larger than the current sample (especially considering the low reliability of the measures used; Grady, Rieck, Nichol, Rodrigue, & Kennedy, 2020;Marek et al., 2020), the absence of relationships here may not be strong evidence against the existence of potential associations between amygdala-mPFC circuitry and developing anxietyrelated behaviors.

| Advantages and pitfalls of the multiverse approach
Our findings contribute to a body of work demonstrating that preprocessing and modeling choices can meaningfully influence results (Botvinik-Nezer et al., 2020). Indeed, most studies involving many analytical decision points could benefit from multiverse analyses. Such specification curves can help to examine the stability of findings in both exploratory and confirmatory research (Flournoy et al., 2020). Particularly when methodological "gold standards" have not been determined, specification curves may be informative for examining the impacts of potential analysis decisions (Bridgeford et al., 2020;Dafflon et al., 2020). Further, wider use of specification curves might help to resolve discrepancies between study findings stemming from different analysis pipelines.
While specification curve analyses may benefit much future research, we also note that multiverses are only as comprehensive as the included specifications (Steegen et al., 2016), and such analyses alone do not solve problems related to unmodeled confounds, design flaws, inadequate statistical power, circular analyses, or non- Computational resources are a relevant concern when conducting multiverse analyses as well. In the current study, preprocessing (registration in particular) was the most computationally intensive step, taking an estimated 4 hours of compute time per scan per pipeline using 4 cores on a Linux-based institutional research computing cluster. However, specification curve analyses themselves were relatively less intensive, with all group-level models of amygdala reactivity completing in a total of 48 hr using 4 cores on a laboratory Linux-based server. Specification curves using maximum likelihood models (lme4 in R; Bates, Maechler, & Bolker, 2011) were even more efficient, with thousands of models running within minutes using a 2019 MacBook Pro (2.8 GHz Intel Core i7).

| Limitations
The present study is subject to several limitations that may be addressed in future investigations. Perhaps most crucially, our conclusions (along with those of many developmental fMRI studies) are limited by the poor test-retest reliability of the fMRI data. Because amygdala-mPFC measures showed low reliability across study visits, the statistical power of our analyses of age-related changes is likely low (Elliott et al., 2020;Zuo, Xu, & Milham, 2019). Low-powered studies can yield increased rates of both false positive and false negative results (as well as errors of the sign and magnitude of estimates; Button et al., 2013;Gelman & Carlin, 2014); therefore, we caution against interpretation of our developmental findings (and brainbehavior associations) beyond the cohort studied in the present investigation. In particular, the low statistical power of our rapid eventrelated task design may be a major contributor to the low test-retest reliability and variance in outcomes across analysis specifications. That being said, achieving high-powered studies presents a challenge for studying populations that cannot tolerate lengthy fMRI sessions. Both findings that were more robust to analytical decisions (amygdala reactivity) and findings that were less so (amygdala-mPFC connectivity, associations with separation anxiety) may be most valuable in metaanalytic contexts where greater aggregate statistical power can be achieved. In particular, future work on amygdala-mPFC development will benefit from optimization of measures both on robustness to analytic variability (Li et al., 2021) and reliability (Kragel, Han, Kraynak, Gianaros, & Wager, 2021).
Present findings are also limited by the number of participants studied (Bossier et al., 2020;Marek et al., 2020), the number of longitudinal study sessions per participant (King et al., 2018), and the duration of the task (Nee, 2019). Work with larger sample sizes, more study sessions per participant, and more task data collected per session will be necessary for charting functional amygdala-mPFC development and examining heterogeneity across individuals (although collecting task-based fMRI will continue to be challenging for studies including younger children). The generalizability of the current findings may also be limited by the fact that this cohort was skewed toward high incomes and not racially or ethnically representative of the Los Angeles or United States population.
Findings are also somewhat limited by the fact that the present study is not wholly confirmatory, despite preregistration. Because our multiverse analysis approaches expanded significantly beyond the methods we preregistered, most of the present analyses, while hypothesis-driven, must be considered exploratory (Flournoy et al., 2020). The fact that some specifications used data included in previous similar analyses of the same cohort (Gee et al., 2013) also limits the confirmatory power of the present study (Kriegeskorte, Simmons, Bellgowan, & Baker, 2009). This may be especially true because longitudinal models could not identify within-person change as distinct from between-participant differences (see Figure 2d), indicating that our age-related change estimates may be influenced by cross-sectional information similar to that investigated by Gee et al. (2013).
Though the current study aimed to estimate longitudinal agerelated changes in amygdala-mPFC functional circuitry evoked by fear and neutral faces, the current findings may not be specific to these stimuli (Hariri et al., 2002). Because our task did not include non-face foils or probe specific emotion-related processes, results may be driven by attention, learning, or visual processing, rather than affective or face processing. In particular, because participants were instructed to press a button for neutral faces and withhold a button press for fear faces, observed amygdala-mPFC responses may in part reflect response inhibition (for fear faces; Menon, Adleman, White, Glover, & Reiss, 2001) and target detection processes (for neutral faces; Jonkman, Lansbergen, & Stauder, 2003). Findings for the fear > baseline and neutral > baseline contrasts also may not be valencespecific in the absence of a different emotional face as part of the contrast. Further, because all faces were adult White women, the current results may not generalize to faces more broadly (Richeson, Todd, Trawalter, & Baird, 2008;Telzer, Humphreys, Shapiro, & Tottenham, 2012). Additionally, because face stimuli were the same across study visits, exposure effects across sessions may confound longitudinal findings (although exposure effects may be possible any time a task is repeated, even if stimuli are unique), particularly agerelated decreases in amygdala responses (Telzer et al., 2018). While within-session amygdala habituation effects have been shown across several paradigms (Geissberger et al., 2020;Hare et al., 2008;Hein et al., 2018), between-session habituation effects are unlikely beyond 2-3 weeks (Geissberger et al., 2020;Johnstone et al., 2005;Plichta et al., 2014;Spohrs et al., 2018).
Finally, our findings on age-related change in amygdala and mPFC function may be biased or confounded by age-related differences in head motion (Ciric et al., 2017), anatomical image quality and alignment (Gilmore, Buser, & Hanson, 2020;Rorden, Bonilha, Fridriksson, Bender, & Karnath, 2012), signal dropout, and physiological artifacts (Boubela et al., 2015;Fair et al., 2020;Gratton et al., 2020). While our multiverse analyses included preprocessing and group-level modeling specifications designed to minimize some of such potential issues, future work is still needed to optimize discrimination of developmental changes of interest from such potential confounds.
Despite these limitations, the present study concords with prior investigations in demonstrating the value of multiverse approaches to quantify sensitivity to researcher decisions. The results highlight key analytic considerations for future studies of age-related changes in amygdala-mPFC function, as well as for studies of human brain development more broadly.

CONFLICT OF INTEREST
The authors declare no potential conflict of interest.