Parameter clustering in Bayesian functional principal component analysis of neuroscientific data

The extraordinary advancements in neuroscientific technology for brain recordings over the last decades have led to increasingly complex spatiotemporal data sets. To reduce oversimplifications, new models have been developed to be able to identify meaningful patterns and new insights within a highly demanding data environment. To this extent, we propose a new model called parameter clustering functional principal component analysis (PCl‐fPCA) that merges ideas from functional data analysis and Bayesian nonparametrics to obtain a flexible and computationally feasible signal reconstruction and exploration of spatiotemporal neuroscientific data. In particular, we use a Dirichlet process Gaussian mixture model to cluster functional principal component scores within the standard Bayesian functional PCA framework. This approach captures the spatial dependence structure among smoothed time series (curves) and its interaction with the time domain without imposing a prior spatial structure on the data. Moreover, by moving the mixture from data to functional principal component scores, we obtain a more general clustering procedure, thus allowing a higher level of intricate insight and understanding of the data. We present results from a simulation study showing improvements in curve and correlation reconstruction compared with different Bayesian and frequentist fPCA models and we apply our method to functional magnetic resonance imaging and electroencephalogram data analyses providing a rich exploration of the spatiotemporal dependence in brain time series.

The extraordinary advancements in neuroscientific technology for brain recordings over the last decades have led to increasingly complex spatiotemporal data sets. To reduce oversimplifications, new models have been developed to be able to identify meaningful patterns and new insights within a highly demanding data environment. To this extent, we propose a new model called parameter clustering functional principal component analysis (PCl-fPCA) that merges ideas from functional data analysis and Bayesian nonparametrics to obtain a flexible and computationally feasible signal reconstruction and exploration of spatiotemporal neuroscientific data. In particular, we use a Dirichlet process Gaussian mixture model to cluster functional principal component scores within the standard Bayesian functional PCA framework. This approach captures the spatial dependence structure among smoothed time series (curves) and its interaction with the time domain without imposing a prior spatial structure on the data. Moreover, by moving the mixture from data to functional principal component scores, we obtain a more general clustering procedure, thus allowing a higher level of intricate insight and understanding of the data. We present results from a simulation study showing improvements in curve and correlation reconstruction compared with different Bayesian and frequentist fPCA models and we apply our method to functional magnetic resonance imaging and electroencephalogram data analyses providing a rich exploration of the spatiotemporal dependence in brain time series.
images which vary over a continuum, usually time or space (see Ramsay and Silverman 1 for an overview). In the FDA framework, data can be considered as noise-corrupted, discretized realizations of underlying smooth functions (curves or trajectories) which are recovered using basis expansions and smoothing. 2 Many standard statistical tools have been translated into the FDA framework. Functional principal component analysis (fPCA) is a technique that defines a set of smooth trajectories as an expansion of orthonormal bases (eigenfunctions) and weights which are called functional principal component scores (fPC scores). 1 One of the advantages of fPCA is that it can be conveniently represented as a hierarchical mixed model in the Bayesian setting, with the joint posterior distribution of the fPC scores being the main target of inference. 3 There has been a growing interest in applying FDA to neuroscientific data (see, among others, Viviani et al, 4 Tian et al, 5 and Hasenstab et al 6 ). Often, in the FDA literature, underlying random curves are assumed to be independent and their correlation is ignored if believed to be mild. 7 However, curve dependence is of particular importance in the analysis of brain activity because of the complex architecture of spatiotemporal connections between brain areas. 8 Recently, Liu et al 7 considered spatial dependence among trajectories by modeling the covariance of the fPC scores within a frequentist approach. Their results showed significant improvements in curve reconstruction compared with the standard approach assuming independence, especially with low signal-to-noise ratios.
The present study introduces a new method for the analysis of functional data in neuroscience. We develop a novel Bayesian fPCA model called parameter clustering fPCA (PCl-fPCA) that makes use of a Dirichlet process (DP) mixture [9][10][11] to model the prior distribution of the fPC scores. Different functional mixture models that cluster functions through clustering of the coefficients in a basis expansion have been proposed in the literature. [12][13][14][15][16][17][18][19] However, these works have focused on a global clustering of curves, without considering local differences as well as the possibility of a dynamic evolution of dependence among curves. In this work we use the principal component bases due to their straightforward interpretation and employ DP mixture priors for every eigendimension retained. By allowing different clustering of the fPC scores for each eigendimension retained, we avoid the limitations of assuming separability of the cross-covariance and any a priori spatial covariance structure of the data, obtaining further insights from space-time interactions. The study of how interactions among brain regions change dynamically during an experiment (ie, dynamic functional connectivity) has recently attracted wide interest in the neuroimaging literature. This analysis has the potential to improve our understanding of how the brain works under both physiological and pathological conditions with recent studies focusing on the application of dynamic functional connectivity to aging, 20 schizophrenia, 21 dementia, and Parkinson's disease. 22 This is a new frontier for neuroscientific research and the development of suitable models able to capture the intricate spatiotemporal dynamics in the data will lay the foundations for the progress in this area in coming years. 23 In this regard, we show that our approach has multiple advantages in the analysis of neuroscientific data as it offers further insights into the spatiotemporal structure of the data as a result of dimension-specific curve classification; it improves curve reconstruction thanks to the local borrowing of information compared with current fPCA approaches; and it can be defined as a simple and computationally feasible hierarchical model which can be easily implemented in R.
The rest of the article is structured as follows: in Section 2 we overview the standard Bayesian fPCA model and introduce our new method, along with computational details. Section 3 reports the setting and results of a simulation study where we compare the performance of PCl-fPCA with standard Bayesian and frequentist fPCA approaches under different data generating processes and noise levels. Section 4 addresses the application of our method to a resting-state fMRI data set and a task-based EEG recording and we discuss the further insights obtained in the spatiotemporal structure of the data and the underlying neurophysiological processes. Conclusions are discussed in Section 5.

Bayesian functional PCA
The standard FDA model is given by where Y it denote the noise-corrupted, discretized, observed data for every spatially correlated region (trajectory) i = 1, … , n and time point t = 1, … , T; X it is the associated underlying random curve as a realization of an L 2 stochastic process {X t ∶ t ∈ [1, T] ⊆ } with mean t and covariance function G(s, t); and it the noise term with zero mean and precision . 24 Functional PCA assumes that the covariance kernel G(s, t) of the process X t can be represented by the Karhunen-Loève expansion, such that where kt are orthonormal eigenfunctions and k are the associated eigenvalues. Then, each realization X it can be represented by a linear combination of eigenfunctions kt , which are usually assumed to be observed, and fPC scores ik , which are the main goal of inference. The reader is referred to chapter 8 of Ramsay and Silverman 1 and the recent review of Joliffe and Cadima 25 for a more detailed presentation of functional PCA. Although the number of eigendimensions can also be modeled with an appropriate distribution (see, eg, Suarez et al 26 ), this considerably increases the computational complexity of the model and thus in practice only K predetermined terms of the linear expansion are retained pertaining to those that explain a sufficiently large part of the total variability in the data. 27 Often the case t = 0 is assumed and the centered dataỸ it are obtained by subtracting an estimatêt of the population average. 3 The fPC scores ik are given prior probability distributions in the Bayesian framework. The standard Bayesian fPCA model 3 assumes fPC scores to be independent draws from a univariate zero-centered normal distribution whose variance is dependent on the eigendimension k. The most straightforward hierarchical representation of the standard Bayesian fPCA model isỸ with a, a ′ , b, b ′ usually set to low values (eg, 10 −3 ). In this model the noise term is assumed to be Gaussian and independent gamma priors are placed over the precision parameters because of their conjugacy property, permitting closed-form conditional posterior distributions and the use of Gibbs sampling. Recently, Liu et al 7 proposed to capture spatial dependence through a suitable model for the covariance of fPC scores. In particular, they defined Cov( ik , i ′ k ) as a function of the correlation coefficient ii ′ k which they modeled using the Matérn function family and estimated the corresponding parameters. This approach implies the a priori definition of a covariance structure which depends on the distance between observations; such assumptions might not be suitable for complex spatiotemporal phenomena such as brain activity where dependencies are the result of both structural and functional neuronal pathways as well as task-specific characteristics. In this study, we overcome these limitations to achieve a higher level of flexibility in the modeling of the spatiotemporal covariance of neuroscientific data.

PCl-fPCA model
In this section we present the structure of the PCl-fPCA model and the features of this approach that improve the current methods for functional PCA. The following hierarchical model defines the probability distribution generating observed time series. We present and comment on each level separately.
Level 1: As the standard Bayesian fPCA model in Equation (4), the distribution of the centered data given the parameters of the underlying smooth function and the noise term is given by: whereỸ i , X i , and k are T-dimensional vectors and N T (X i , −1 I) denotes a multivariate Gaussian distribution with mean X i and variance-covariance matrix −1 I such that I denotes the T × T identity matrix. As in Equation (4), the eigenfuctions k are assumed to be observed and the parameter does not depend on i or t, that is, the noise is assumed to be constant in both space and time, although other characterizations are possible. 24 It follows that the likelihood function is given by Level 2: To encode fPC scores cluster membership we introduce a classification variable c ik as a stochastic indicator that identifies which latent class j in eigendimension k is associated with parameter ik . Prior distributions of the fPC scores ik , given the parameters of underlying clusters [( 1k , s 1k ), … , ( Jk , s Jk )] and the classification variable c ik , are given by where c ik =j and s c ik =j denote the mean and precision for the jth cluster in the kth eigendimension, respectively. Here we use a J-dimensional mixture of Gaussian distributions, independently, for each retained eigendimension k = 1, … , K as we permit different (independent) partitions of the fPC scores for each mode of variation. It is worth recalling that, in the context of DP mixtures, J represents an upper bound on the number of fPC score clusters. 28 In the rest of the article we define J + k < J as the (data-driven) number of nonempty clusters in each eigendimension k. 29 Level 3: Prior distributions for [( 1k , s 1k ), … , ( Jk , s Jk )] and (c 1k , … , c nk ), given hyperparameters r k , k and parameters (p 1k , … , p Jk ), are given by where f C denotes the categorical distribution which generalizes the Bernoulli random variable to J outcomes. Cluster precision s jk can also be modeled using uniform distributions on the cluster standard deviation where jk = 1∕ √ (s jk ). 30 Hyperparameters r and are often centered around empirical estimates in the literature; 31 here, we take advantage of the properties of fPCA decomposition to tune the higher hierarchical levels in our model around weakly informative prior distributions. It follows from the Karhunen-Loève representation that, for any given i, ik are uncorrelated fPC scores with monotonically decreasing variance given by the eigenvalues k ; 7 therefore, sensible functions of the empirical estimates of the eigenvalueŝk can be used to fix r and under the assumption that, for every eigendimension k, the position and dispersion of a cluster are both functions of̂k. We note that setting r = 1∕̂k and =̂k worked well in our simulations and application. Levels 4 and 5: Prior distribution for (p 1k , … , p Jk ), given hyperparameter and prior distribution for are given by where p jk follow the stick-breaking construction 32 with parameter k modeling the prior belief over the mixing proportions p 1k , … , p Jk . The dispersion parameter is usually fixed or modeled with a prior distribution; here we used a uniform distribution with sufficiently large Q. 11,[33][34][35] Different specifications of s jk and Q can be employed for k = 1 and k = 2, … , K to incorporate the knowledge that the first eigendimension is more likely to capture global patterns in the data while the following dimensions are more sensitive to local features. For example, in the first eigendimension one can use the gamma distribution for the cluster precision in Equation (8) as it assigns more weights to large clusters than a uniform on the standard deviation which can be used instead in the subsequent dimensions. We provide specific examples in Section 3.1 and the results of a sensitivity analysis on Q, , and s in the WebA section of the Supplementary Material file.
The model structure can be displayed with a direct acyclic graph (DAG) (Supplementary Material, WebB section, Figure 1). As J approaches infinity the model corresponds to a DP mixture model 10,11,33,34,36 with the difference that we have placed here multiple independent mixtures over the prior distribution of the fPC scores. In practice we used the truncated stick-breaking construction and tested the model with different commonly chosen values of J (J = 20, 30, and 50). The upper bound J should be chosen sufficiently large to ensure J + k < J in each eigendimension. Larger Js will naturally impact on computations (eg, in our applications we observed the computational time of the model with J = 50 to be ∼1.5 higher than with J = 20). All the conditional posteriors of this model (most of them available in closed form) are provided in the Appendix. Markov chain Monte Carlo (MCMC) techniques are used to simulate from the joint posterior distribution of all parameters given the data. Reconstruction of the smooth trajectories x it is made easy by its linear relationship with the model parameters ik ; thus it is possible to obtain the posterior distribution of the ith curve for every t and at every MCMC iteration w, where x t is the smoothed estimate of the sample mean ∑ n i=1 y it ∕n. It follows that symmetric 95% pointwise credible intervals for each trajectory-specific mean can be obtained easily from Equation (10) by considering the (1 − )∕2 and ∕2 quantiles of the {x (1) it , … , x (W) it } empirical distribution.

Clustering
In this section we focus on the clustering of fPC scores. The discrete nature of the DP is very useful for clustering as it allows ties among the latent c ik ; 37 therefore, DP mixtures implicitly return classification through the allocation of each fPC score to a generating distribution with some probability. Clustering uncertainty can be evaluated at different levels such as the number of clusters, the size of each cluster, and the fPC scores assigned to them. For the explorative purpose of our model we avoid the use of automated algorithms to select a final partition of the fPC scores (either classical hierarchical or partitioning algorithms based on the similarity matrix 34 or more recently proposed algorithms based on a loss function over clusterings 38 ). Instead, we propose a three-step exploration of the empirical distribution of generated clusterings which we find useful to evaluate clusters uncertainty arising from the data. After burn-in, the empirical distribution of generated clusterings {c (1) k , … , c (W) k } can be considered a good approximation of the true posterior distribution 10 and it can be used to obtain other distributions of interest, such as the number and size of nonempty clusters, maximum a posteriori probabilities (MAPs), and pairwise probability matrices (PPMs). We make use of these distributions in a three-step exploration.
Step 1: The distribution of the number of nonempty clusters J + k can be obtained by exploring the values of the classification variable c k for all the W iterations retained after burn-in (J +,w Although considering the number of nonempty clusters J + k does not account for size and stability (ie, the number of times a cluster appears in the MCMC chain), the distribution of J + k provides a useful first check for assessing the presence of more than one cluster in each eigendimension. For this purpose, we used the Bayes factor (BF) defined as denote posterior probabilities and P(J + k = j) is the relative prior probabilities which can be obtained by simulating from the prior distribution of c k . A BF greater than 1 suggests absence of clusters in the fPC scores of a specific eigendimension; hence, this step identifies those eigendimensions where clusters are more likely to exist in the data.
Step 2: The distribution of the cluster size can be obtained by counting for each iteration w the number of fPC scores allocated to the same label or by monitoring the posterior distribution of the mixing proportions p jk . Although there is no guarantee that fPC scores joining a cluster remain loyal to it, the size of clusters permits the identification of clusters which are populated only sporadically as a result of the uncertainty in the classification of subsets of fPC scores. The distribution of these clusters has typically a notable probability mass at zero. Therefore, this second step can help understand the number and dimension of clusters we expect to see in each eigendimension and the relative uncertainty.
Step 3: Finally, MAPs and PPMs can help refine our understanding of the underlying clustering. MAPs are commonly used to identify the most probable clustering for each observation and they can be computed by identifying for each fPC score the posterior mode of c ik from the empirical distribution of generated clusterings. MAPs are known to be limited by the possible presence of multiple modes and cases where individuals who share the same modal group are less frequently together than with others in different clusters. These issues can be addressed by the PPMs which represent the posterior belief for all pairs of curves to belong to the same cluster regardless of the clustering label. 33,34,36 For each iteration w, an n × n association matrix (c k ) can be obtained with indicators ii ′ (c k ) which takes value 1 if fPC score i and i ′ in eigendimension k are clustered together and 0 otherwise. Elementwise averaging over all these association matrices yields the PPM. Combining the exploration of MAP and pairwise probabilities can narrow down a decision on the most likely partition of the fPC scores.
Although we find limitations for each of these steps individually to draw robust conclusions, considering them together as a whole provides rich information on the (a posteriori) most likely partition for each eigendimension. Particularly in the case of complex phenomena, such as those captured by neuroscientific recordings, a thorough exploration of cluster uncertainty in the data should be always considered to ensure a sensible interpretation of the results. We present an application of these analyses to fMRI and EEG data in Section 4. In a Bayesian mixture model where cluster identification is of interest, extra care should be taken to avoid label switching arising from the symmetry in the likelihood of model parameters. This can be avoided either by imposing identifiability constraints on the parameter space or by employing relabeling algorithms. In our simulation study and applications we found that imposing constraints on the order of cluster means ( 1k <, … , < Jk ) or weights (p 1k < , … , < p Jk ) was enough to successfully control label switching.

fPC score clustering as generalization of standard clustering
In the standard infinite mixture model based clustering, the indicators c i = c i ′ = j with i ≠ i ′ would associate a couple of trajectories to a certain cluster j with probability P ii ′ . On the other hand, by placing infinite mixtures over the fPC scores for every eigendimension retained, we allow for a more complex network of dependence among curves. In our model, c ik and c i ′ k would associate fPC scores i and i ′ to potentially different clusters in every eigendimension k with probability P ii ′ k . It follows that a pair of curves could happen to share the same cluster in only part of the K eigendimension retained, expanding the standard model based clustering to a richer classification method. Furthermore, as each dimension represents a mode of variation (eigenfunction) and its importance (eigenvalue), our method offers additional insights into the underlying spatiotemporal structure of the data. In the following sections we show how clustering fPC scores produces a rich spatiotemporal exploration of complex neuroscientific data.

Simulation scenarios
We performed a simulation study to assess the performance of PCl-fPCA model and compare it to the standard Bayesian fPCA model in terms of both curve reconstruction and classification for different data generating processes and noise levels. We also included for comparison two frequentist approaches: the standard fPCA model 1 and a modified version of the model by Liu et al 7 that we adapted to the features of neuroscientific data. In this latter model, curve dependence is captured through the fPC scores by means of independent Matérn functions for each eigendimension retained.
In order to test model performance with simulated data matching those of the targeted neuroscientific applications as closely as possible, we generated two eigenfunctions from simulated data resembling evoked responses in the brain using the function pca.fd from the fda package in R. 39 Subsequently, we defined three data generating processes (DGP) that For each DGP, we combined the two eigenfunctions with the fPC scores to build the simulated data sets. We applied a random Gaussian noise and tested the models with both high and low signal-to-noise ratios (STN = 6 and 1, respectively). Figure 1 shows an example from the set of 100 generated curves in DGP1 where either a low or high random noise is added.
One hundred data sets (L = 100) for each DGP and STN were input to fPCA first for curve smoothing using cubic B-splines and dimension reduction by estimating the respective eigenvalues and eigenfunctions using the function pca.fd from the fda package in R. 39 We retained a number of dimensions K explaining at least 95% of the total variability in curves. Figure 1 shows eigenfunctions and their weights extracted after smoothing a set of low-noise curves for the first DGP.
We adapted the general model presented in Section 2.2 to the specific simulation analysis using eigenvalues k and their properties to develop vaguely informative prior distributions for the parameters r, , and Q (Equations (8) and (9)) in the two eigendimensions retained k = 1, 2. We set r ∈ {1∕̂1, 1∕̂2} and Q ∈ {10, 5} as well as setting s j,1 ∼ Γ(1, 1 ) and j2 ∼ U[0, . The use of a uniform distribution in the second dimension favors the search of smaller clusters than in the first eigendimension, as increasingly local features should be expected in trailing modes of variation. 7 We made sure that even the smallest upper bound Q of the dispersion parameter distribution represented an expected number of clusters a priori far higher than the ground truth. 40 We coded the model in R using the rjags package, 42 and employed a conservative approach using 100 000 iterations for the burn-in and retaining the subsequent 100 000 MCMC iterations. 33,43 The convergence diagnostics did not suggest lack of convergence for all the parameters of interest. We used a thinning of five to store results from 100 simulated data sets efficiently (approximately 70 MB each with K = 2). It takes 36 minutes on average to complete one simulation run on a 2-core Intel CPU running at 2.7 GHz with 8 GB RAM.
We used integrated mean squared error (IMSE) to measure and compare reconstruction performance between PCl-fPCA model and the competitor models. IMSE and its associated approximation for every curve i are given by where the expectation is taken with respect to the underlying curve x i . The IMSE is a useful measure of performance in density estimation and is frequently used in curve reconstruction. 44,45 In addition, as curves correlation ii ′ is often of interest in neuroscientific applications (eg, for measuring the degree of functional connectivity between brain areas), we measured correlations reconstruction using the L2 norm ||̂i i ′ − ii ′ || 2 and compared it with those of the competitor models. In order to assess the proposed model clustering performance in DGP1, we adopted the Adjusted Rand Index (ARI) to quantify the similarity between the estimated partitions (using MAP) and the ground truth for every simulated data set l and eigendimension k. The ARI is commonly used in the literature to assess clustering performance as it varies between exact partition agreement (1) and when partitions agree no more than is expected by chance (0). 36,46 Moreover, we measured the improvement in distance (L 2 norm) between the posterior pairwise probability matrices and the ground truth to evaluate the clustering performance of PCl-fPCA model by taking into account cluster uncertainty. Further details on the simulations setting can be found in WebC section of the Supplementary Material.

Simulation results
Results of curve and correlation reconstruction are reported in Figure 2. The case where STN = 1 is particularly relevant because neuroscientific data are usually affected by high noise. In this scenario, PCl-fPCA model highly improved curve reconstruction compared to all competitor models as 100% of the true curves were better recovered under PCl-fPCA and the median improvement in IMSE ranged from 22% to 45%. Moreover, a similar improvement was also obtained for DGP2 where clustering is present in only one eigendimension (Figure 2, bottom left). In addition, correlation reconstruction was also better achieved under PCl-fPCA with a median percentage of improvement ranging from 20% to 30% for DGP1 and 2% to 8% for DGP2 (Figure 2, right column). In the case of low noise (STN6), the proposed model still performed better than the competitors for DGP1 and achieved values of IMSE and RMSE similar to those of the best competitor models in DGP2 (Supplementary Material, WebB, Figure 2). Interestingly, even when no clusters are expected in both eigendimensions (DGP3), the performance of the PCl-fPCA was still comparable to the best ones achieved by competitor models for both low and high noise levels (Supplementary Material, WebB, Figure 3). The performance of the PCl-fPCA model in terms of classification is reported in Table 1. The proposed model scored high in the ARI classification index in both eigendimensions studied; two and three clusters were expected in the first and second dimension, respectively, in DGP1. Clusters in the first eigendimension were always correctly identified by ARI for both high and low signal to noise ratios. The identification of three clusters in the second eigendimension was more challenging as they were smaller and nearer to each other; however, scores near 1 were almost always obtained when the low noise scenario was tested and even in the case of high noise we observed fairly high scores. Similar results  were achieved by measuring the improvement in distance (L2 norm) between the posterior pairwise probability matrices and the ground truth to account for cluster uncertainty in the classification performance (Supplementary Material, WebC, Table 3).

APPLICATION
In this section we present two applications of the PCl-fPCA model to the analysis of neuroscientific data from fMRI and EEG recordings. In Sections 4.1 and 4.2, the PCl-fPCA model is used to explore underlying brain patterns arising from a short-time window fMRI recording of a healthy subject at rest. In the emerging field of dynamic functional connectivity, the analysis of the evolution of brain patterns within a short-time window is of particular interest as it could uncover transient configurations of coordinated brain activity. 47 The aim of the present fMRI analysis is to verify whether the results obtained on a short-time window recording (1 minute) are in line with the current knowledge on brain resting-state networks obtained from static functional connectivity studies where results are typically averaged over 5 to15 minutes recordings. In Sections 4.3 and 4.4 the PCl-fPCA model is used for artifacts identification in the EEG recording of a healthy subject under a two-stimuli paradigm (match vs unmatch images). The presence of artifacts originating from sources different from the brain and contaminating brain signals is a well-known problem in EEG recordings and an active area of research in neurophysiology. 48 The aim of the present EEG analysis is to check whether the fPC-PCA model can be successfully used to identify the spatiotemporal features of different artifacts and the location of the relative affected brain areas.

fMRI setting
The study relates to a 30-year-old healthy woman volunteer who underwent a resting-state fMRI at the Department of Radiology, Scientific Institute Santa Maria Nascente, Don Gnocchi Foundation (Milan, Italy) during February 2015. The recording was carried out using a 1.5 T Siemens Magnetom Avanto (Erlangen, Germany) MRI scanner with eight-channel head coil. The subject was asked to lie down in the MRI machine in supine position with eyes closed while blood oxygenation level dependent echo planar imaging (BOLD EPI) images were acquired. She was instructed to keep alert and relaxed; no specific mental task was requested. High-resolution T1-weighted 3D scans were also collected to be employed as anatomical references for fMRI data analysis. Standard preprocessing involved the following steps: motion and EPI distortion corrections, nonbrain tissues removal, high-pass temporal filtering (cut-off 0.01 Hz), and artifacts removal using the FMRIB ICA-based Xnoiseifier (FIX) toolbox. 49 After the preprocessing, the resulting 4D data set was aligned to the subject's high-resolution T1-weighted image, registered to MNI152 standard space and resampled to 2 × 2 × 2 mm 3 resolution. One minute length series (sampled at 0.5 Hz) were extracted as the average signal within each of 90 regions of interest (ROIs) according to the automated anatomical labeling (AAL90) coordinates. The resulting 30 × 90 data set was input to fPCA for curve smoothing and dimension reduction using the pca.fd function from the fda package in R. 39 The set of 90 smooth curves and the retained eigendimensions are shown in Figure 5 of Supplementary Material WebB. We kept the first three dimensions explaining more than 85% of the total variability while accounting for more than 10% each.
We adapted the general model in Section 2.2 following the approach taken in the simulation study (Section 3.1), favoring global patterns in the first eigendimension and local patterns in the remaining dimensions. We assessed convergence using trace plots and BGR diagnostics and the number of independent retained samples by computing the effective sample size (WebD, Supplementary Material). We employed the same computational approach described in Section 3.1 and it took 59 minutes to run the analysis with K = 3 on a 2-core Intel CPU running at 2.7 GHz with 8 GB RAM. Furthermore, we carried out a sensitivity analysis by varying the values of the hyperparameters , Q and the distribution of s in each dimension (WebA, Supplementary Material).

fMRI analysis results
The posterior probabilities associated with the single cluster (ie, no clusters) scenario were 0.012, 0.124, and 0.058 for the three eigendimensions k, respectively. The Bayes factors for the first eigendimension was 0.53, which indicates some evidence against no clusters. Conversely, the second and third dimensions returned BF = 2.93 and 1.33, respectively, which can be interpreted as evidence in favor of a single cluster. It is worth noting that, as the implied prior probabilities were highly in support of multiple clusters, the BF for k = 2 and 3 show a diametrical change from prior to posterior belief. These results are also confirmed by a BF sensitivity analysis which is reported in the Supplementary Material (WebA). Figure 3 shows the posterior probability for a cluster being empty and the posterior distributions of cluster size given it is not empty. Two to three clusters seem to emerge in dimension 1; the size of the second cluster (Cl2, second from the right in Figure 3, bottom-left panel) has a peak around 20%, very small mass near zero, and a very low probability of being empty. The third cluster (Cl3) has a size peaking at 12% but more mass near zero and a higher probability of being empty. On the other hand, dimensions 2 and 3 seem to suggest the presence of no more than one cluster each. The second cluster in both these dimensions has higher probability of being empty and the distributions of size have much more mass around zero. Furthermore, the distributions of the first cluster (Cl1) in both dimensions have a notable peak around 90% suggesting that, even when more than one cluster is considered, the large majority of fPC scores in dimensions 2 and 3 tends to be gathered within a single large cluster.
The use of MAPs suggests there might be no more than two groups in the first dimension and one group in the second and third dimensions. Clustering with MAPs in the first dimension identified 9% of curves whose trajectories are wigglier and with a visibly shorter interpeak difference between the first positive and negative peaks compared with the other group (WebB, Figure 6). Figure 7 of section WebB in the Supplementary Material shows an example of curve reconstruction using the posterior mean and 95% pointwise credible bands of the subject specific mean. Curves in cluster 2 pertain to brain areas from the occipital lobe (calcarine, cuneus, lingual, inferior occipital gyrus) and parietal lobe (precuneus).
By analyzing the pairwise probability matrix, a more comprehensive classification emerged. The previously dichotomous partition in dimension k = 1 is now enriched by a third group of brain areas with no clear clustering preference (gray band at the top-right of the pariwise probability matrix in Figure 4). Cluster 2 comprises 16% of curves which all represent areas from the occipital lobe (yellow-light dots), while curves in cluster 3 (blue-dark dots) belong to the cingulate cortex (middle and posterior cingulate cortex), parietal (parietal superior lobule, precuneus), and temporal (middle and inferior temporal gyrus) lobes ( Figure 4).
We note that these three clusters are supported in the neuroimaging literature. It is well established that primary and extrastriate visual regions are active at rest 50 and have a role in processing mental imagery. 51 Just outside the visual cortex, the temporal inferior gyrus takes part to the visual ventral stream which links information from the visual cortex to memory and recognition. 52 Moreover, the posterior cingulate cortex is known to interact with several different brain networks simultaneously and it participates in the default mode network together with part of the parietal lobe. 53 Conversely, it has been suggested that areas pertain to the prefrontal cortex (all included in cluster 1) have less long-range connectivity in the resting state condition. 54 Finally, the sensitivity analysis further confirmed our findings as they were robust to changes in both shape and value of the hyperparameters (WebA, Supplementary Material).

EEG setting
For our second application we employed data from an EEG study on brain activations following object recognition tasks (event related potentials, ERPs). 55 ERPs are very small bioelectrical signals generated by the brain in response to specific events or stimuli. They are EEG changes time locked to motor, sensory or cognitive events that provide a noninvasive approach to study psychophysiological correlates of mental processes. 56 By contrast, body or eye movements introduce large artifacts to EEG recordings and trials contaminated with artifacts need to be corrected or even discarded. 57 In the present study we employed the PC-fPCA model for artifacts identification in the EEG recording of a single healthy subject. The individual was presented with two separate stimuli in the forms of images taken from the 1980 Snodgrass and Vanderwart picture set. 58 The second stimulus was either a different image (unmatch) or the same image (match) as in the first stimulus. We used the data-driven clustering of the PCl-fPCA model to identify the spatiotemporal features of different artifacts and the relative affected brain areas. The data were recorded using a cap with 64 electrodes placed on the subject's scalp and the brain activity at each recording electrode was sampled at 256 Hz for 1 second. Further details on the recording setting can be found in Zhang et al. 55 We considered both the unmatched and matched tasks within the same analysis and used our PCl-fPCA model to find data-driven differences in the morphology of the curves. Therefore, a 128 × 256 data set was input to fPCA for curve smoothing and dimension reduction using the pca.fd function from the fda package in R. 39 The set of 128 smooth curves and the retained eigendimensions are shown in Figure 8 of Supplementary Material WebB. We kept the first two dimensions explaining 90% of the total variability while accounting for more than 10% each. We applied the same model settings described in Section 4.1; we assessed convergence using trace plots and BGR diagnostics and the number of independent retained samples by computing the effective sample size (WebD, Supplementary Material). We employed the same computational approach described in Section 3.1 and it took 64 minutes to run the analysis with K = 2 on a 2-core Intel CPU running at 2.7 GHz with 8 GB RAM.

EEG analysis results
Two clusters seem to emerge in dimension 1. The size of the second cluster (Cl2, second from the right in Figure 5, bottom-left panel) has a peak around 20%, and a low probability of being empty. The third and fourth clusters (Cl3, Cl4) have both sizes peaking near zero and higher probabilities of being empty. On the other hand, dimension 2 clearly indicates the presence of three clusters with sizes peaking at 60%, 20%, and 20% and very low probabilities of being empty. Furthermore, the distributions of the first cluster (Cl1) in both dimensions have very low mass near 1, supporting the presence of multiple clusters in both dimensions. Both MAP and pairwise probability analyses confirmed the presence of two clusters in the first dimension and three clusters in the second dimension ( Figure 6). The second cluster in the first eigendimension contains all the recordings from electrodes in the frontal areas for both the matched and unmatched tasks (Figure 9, WebB, Supplementary Material). These curves have a marked peak at the end of the recording, indicating a possible artifact (probably originated from eye blinking), and they appear to have two separate underlying patterns. These trends are captured in the clustering of the second eigendimension where the second and third clusters further divided the EEG activity in the frontal brain areas between those recorded during the matched and unmatched tasks (Figure 9, WebB, Supplementary Material). Notably, despite all curves showing more variability toward the end of the recordings, we found that only those from frontal areas have a consistently different behavior from that of the group. This is in line with the work of Zhang et al, 55 where the authors excluded frontal region recordings from part of their analyses because of an inconsistent wave morphology compared with the wave form of the other regions. Frontal areas are known to be prone to recording artifacts particularly from eye movement which might have affected the different wave forms observed in these data. 57 Furthermore, the data-driven separation of frontal area curves into tasks (matched and unmatched) suggests the effect of two separate artifacts on the amplitude of these recordings.

DISCUSSION
The processing of the human brain is a complex phenomenon in both time and space. The modeling of spatiotemporal data sets in the big data era is a challenge becoming every day more demanding as we struggle to keep up with the overwhelmingly larger data sets we are required to make sense of. Moreover, the extraordinary advancements in neuroimaging of the last decades have focused large part of neuroscientists and statisticians' efforts on the spatial domain both in clinical practice and research (see, eg, Durante et al 59 ). Nevertheless, the study of how interactions among brain regions change dynamically during an experiment (ie, dynamic functional connectivity) has recently attracted interest in the neuroimaging literature. 60 In fact, the time domain retains important neurophysiological information on brain functioning and neuronal health and without it we are at risk of drawing partial and possibly wrong conclusions on how the brain works.
In the present study we proposed a model that combines functional PCA and Bayesian nonparametric techniques to explore spatiotemporal data sets flexibly. We combined the idea of introducing spatial dependence among curves through the fPC scores proposed by Liu et al 7  is often the case in brain recordings) and the ability of exploring curves dependence dynamically allowing for different spatial patterns for each eigendimension retained.
Improvements in the reconstruction of high-noise corrupted curves were also reported by Liu et al; 7 in fact, the beneficial effect of accounting for curves similarity is more evident when the true signal is well masked behind the noise. Nevertheless, a direct modeling of large covariance matrices often resorts to the use of common covariance functions to avoid overparametrization. The use of functions such as Matérn or rational quadratic implies a priori knowledge on the shape of spatial dependence. We believe that this approach does not suit highly complex phenomena, such as brain processing, where dependence has a much more elaborate architecture than a simple function of spatial proximity. Clustering the fPC scores allowed us to capture dependence among curve flexibly without the need to estimate the relative spatial covariance matrix. Interestingly, our results suggest that the high flexibility of PCl-fPCA model makes it a very suitable choice even in the cases where a single or even none of the eigendimensions retained support clustering of fPC scores. Further improvements may be derived from modeling the correlation or autocorrelation structure of the noise, although the trade-off with model complexity should be taken into account. 1 DP mixture models have also been used for clustering time series through the clustering of the relative coefficients in a basis expansion representation. Many of these works have focused on global clustering, where curves are clustered together for all their coefficients. [12][13][14][15][16][17][18][19] However, not only in neuroscientific data, but in many other types of functional data, curves might be characterized by regions of heterogeneous behaviors; 61 therefore, some authors have proposed alternative approaches that allow also for local differences in the clustering. 62,63 In the present study we moved from a global clustering of the data to a local clustering of fPC scores to address both the exploration of brain activity data and to improve curve reconstruction. Dunson 62 and MacLehose and Dunson 63 used local clustering only as a means to improve estimation and their methods either neglect intersubject variability in the coefficients (Dunson 62 ) or lack cluster interpretability (MacLehose and Dunson 63 ). By contrast, our approach combines the straightforward interpretation of the eigenfunctions with a local clustering of the fPC scores which account for intersubject variability within each cluster. Therefore, we obtained both an improved curve reconstruction and a rich classification technique. In fact, curves are never identical, they can be potentially assigned to different clusters in each eigendimension, and each eigendimension can have a different number of clusters (see Figure 4 of Supplementary Material WebB for a visual example). In addition, the assumption of separability of the cross-covariance matrix is avoided and complex time-space interactions are captured by the model; as a consequence, this local borrowing of information also improves the reconstruction of the underlying smooth process. Moreover, we benefit from the properties of the fPCA expansion to tune the hyperparameters and improve the MCMC convergence.
Cross-covariance matrices are often intractable if we do not resort to compromises in our models. A sensible compromise should be tailored to the type of specific data. In this study, we compromised with the time domain by using fPCA with a fixed number of eigendimensions while giving flexibility in the modeling of spatial dependence. This served the purpose of breaking off from the separability assumption while, at the same time, favoring interpretation and a simple model structure. The fact that the fPCs are treated as known for posterior inference might affect posterior uncertainty. One possible solution to improve coverage is to employ simultaneous credible bounds. These are a finite collection of pointwise intervals, scaled to achieve a specified coverage probability. Existing approaches include those of Besag et al, 64 Krivobokova et al, 65 and Crainiceanu et al. 66 By means of a simulation study and the analysis of fMRI and EEG data, we demonstrate that PCl-fPCA is effective in recovering the underlying smooth curves and it produces a valuable exploration of the spatiotemporal dependence in brain time series. The next step in our approach is the extension to the modeling of multiple subjects' recordings. There are different challenges to consider in the analysis of groups such as the natural interindividual variability in brain functioning and the dimensionality of the data. We intend to expand our method to replicated data and multiple subjects experiments in our future research. Exploring interindividual patterns of functional connectivity and their uncertainty can help answer important questions not only in the study of brain processes but also in the characterization, early diagnosis and prognosis of brain diseases.