Fused graphical lasso for brain networks with symmetries

Neuroimaging is the growing area of neuroscience devoted to produce data with the goal of capturing processes and dynamics of the human brain. We consider the problem of inferring the brain connectivity network from time dependent functional magnetic resonance imaging (fMRI) scans. To this aim we propose the symmetric graphical lasso, a penalized likelihood method with a fused type penalty function that takes into explicit account the natural symmetrical structure of the brain. Symmetric graphical lasso allows one to learn simultaneously both the network structure and a set of symmetries across the two hemispheres. We implement an alternating directions method of multipliers algorithm to solve the corresponding convex optimization problem. Furthermore, we apply our methods to estimate the brain networks of two subjects, one healthy and the other affected by a mental disorder, and to compare them with respect to their symmetric structure. The method applies once the temporal dependence characterising fMRI data has been accounted for and we compare the impact on the analysis of different detrending techniques on the estimated brain networks.


Introduction
A brain network is a model of a nervous system represented as a set of nodes, also called vertices, interconnected by a set of edges; see Bullmore and Bassett (2011) for a review on the use of brain graphs for modelling the human brain connectome. Within the domain of human brain mapping, great interest has been posed on the estimation of brain networks from functional magnetic resonance imaging (fMRI) data (Smith et al., 2011).
Functional MRI is a non invasive technique for collecting data on brain activity, with a good resolution in terms of space and time. Essentially, fMRI measures the increase in the oxygenation level at some specific brain region, as long as an increase in blood flow occurs, due to some brain activity. The latent signal in the observed fMRI data is referred to as the blood oxygenation level dependent (BOLD) signal. The BOLD signal arises from the interplay of blood flow, blood volume, and blood oxygenation in response to changes in neural activity. Under an active state, the local concentration of oxygenated hemoglobyne increases, with a corresponding increase in the homogeneity of magnetic susceptibility, which, in turn, results in an increase of MRI signal. Typically, active states are observed in tasked based experiments in response to an exogenous event. In recent years, the attention has been concentrating towards Resting state fMRI (RfMRI) data, collected on subjects at rest and in absence of any external stimulus, as the key to understand the neuronal organisation of the brain through the investigation of the spatial and temporal structure of spontaneous neural activity. Smith et al. (2009) carried out an analysis to assess how functional networks at rest match the ones detected under activation tasks. The authors conclude that the resting brain functional dynamics are fully utilising the set of functional networks exhibited by the brain over the range of its possible tasks. In the review paper by Biswal et al. (2010), RfMRI is described as the candidate approach capable of addressing the core challenge in neuroimage, i.e. the development of common paradigms for interrogating the functional systems in the brain, without the constraints of a priori hypotheses.
The construction of a network from fMRI data requires first the identification of a set of functional vertices, such as spatial regions of interest (ROIs), and then the analysis of connectivity patterns across ROIs. It is also relevant that the human brain has a natural symmetric structure. More specifically, it is made up of two hemispheres such that for every spatial ROI on the left hemisphere there is an homologous ROI on the right hemisphere. Accordingly, in the brain network one can identify pairs of homologous vertices and edges, and RfMRI studies have suggested a highly symmetric connectivity; see Section 2 for additional details.
In this paper, we address the problem of estimating the brain network from RfMRI data by keeping symmetries into explicit account. Special attention is also posed on the temporal dependence characterising fMRI data and the impact of alternative detrending approaches on the estimated brain network. We remark that we consider undirected graphs, i.e. graphs where the edges are of symmetric type. However, following Højsgaard and Lauritzen (2008), throughout this paper the term "symmetry" refers to similarities across the two hemispheres with respect to their network structure and equality in parameter values.
Undirected graphical models (Lauritzen, 1996) are widely applied in network modelling form fMRI data (see, among others, Marrelec et al., 2006;Smith et al., 2011;Zhu and Cribben, 2018). In this framework, the network structure follows from the sparsity pattern of the inverse covariance matrix of ROI values, and a popular approach to estimate sparse undirected graphical models is the graphical lasso technique (Banerjee et al., 2008;Friedman et al., 2008). This method is based on the optimization of a penalized log-likelihood function, where the role of the penalty term is to encourage sparsity in the network. One drawback of the graphical lasso, in the present context, is the fact that it ignores the symmetric structure of the brain. For this reason, we propose a fused graphical lasso method based on the optimization of a penalized log-likelihood where the penalty function is obtained by the sum of two distinct terms. Like the graphical lasso, the first term encourages sparsity in the solution. On the other hand, the second one is a fused type penalty (Tibshirani et al., 2005) that encourages symmetry by penalising differences between the left and right hemispheres. As detailed more formally in Section 3, symmetries are implemented in the form of equality constraints between entries of the inverse covariance matrix of ROI values. This leads to a convex optimization problem and we provide an alternating direction method of multiplier (ADMM) algorithm for its solution.
The method applies once the temporal dependence characterising fMRI data has been accounted for and the BOLD signal has been extracted. We assume a simple decomposition of pre-processed RfMRI data into an unobserved signal plus noise. The underlying hypotheses on the two latent variables are related to the evolution of the components in time and determine the method adopted for their estimation. As the dynamics of fMRI time series are controversial, we shall assess the impact of detrending on the estimated ROI connectivity network using three methods, representative of different approaches to trend estimation, based on different assumptions and including possible misspecification. In particular, we shall specify a linear Gaussian multivariate parametric model, a non linear observation driven model for unobserved components and possibly heavy tailed data, and a non parametric local polynomial regression method. Details are deferred to Section 4.2.
We carry out an extensive analysis and provide an illustration using RfMRI data from two representative subjects who have similar characteristics in terms of age and handedness, though one of the two is healthy while the other has been diagnosed with some mental disorder. We may anticipate that the results show a lack in the brain asymmetry in the latter individual.
In summary, the novel contribution of this paper is twofold. Firstly, we introduce a fused graphical lasso approach to estimate sparse undirected graphical models with a specific symmetric structure and provide a ADMM algorithm for its solution. An implementation of the latter, written in the R language (R Core Team, 2020), can be found at https://github.com/ savranciati/sgl. Secondly, we compare the impact on the analysis of different detrending techniques, thereby providing insight into the robustness of the estimated network on such a preliminary step.

Related works and possible applications
The novel contribution of this paper pertains the research area usually referred to as joint learning of multiple graphical models. In this framework, the observations come from two or more groups where each group shares the same variables and some of the dependence structure. Accordingly, every group is associated with a network and it is expected that some edges are common across all groups and other edges are unique to each group. More specifically, the literature has focused on the case where the groups correspond to independent experimental conditions so that every network is a distinct unit, disconnected from the other networks; see, among others, Danaher et al. (2014); Yang et al. (2015). Examples of possible applications include genetic networks (see Danaher et al., 2014) and brain networks from neuroimaging data (see Yang et al., 2015). In the former, the groups are normal tissue from healthy subjects and one or more different types of cancer tissues whereas in the latter groups correspond to normal subjects and subjects with different degree of cognitive impairment.
Our work is motivated by brain networks inference where the groups are given by the two hemispheres. The independence assumption does not hold in this case because every observation from the left hemisphere is paired with an observation from the right hemisphere. Thus, existing methods for joint learning of graphical models no longer apply, and one should distinguish between independent samples and paired data, the latter being a largely unexplored area of research. Note also that, unlike the case of independent samples, with paired data the networks associated with different groups are not expected to be disconnected from each other. Although we focus on brain networks, our approach applies to other settings, and another relevant example comes form cancer genomics where control samples are often obtained from histologically normal tissues adjacent to the tumor (NAT), so that every observation from a cancer tissue is paired with an observation from a normal tissue; see e.g. Aran et al. (2017).
A central role in the theory of joint learning of multiple graphical models for independent samples is played by the group and the fused graphical lasso (Danaher et al., 2014). The penalty term of the group graphical lasso encourages both sparsity and a similar structure of the networks. On the other hand, the fused graphical lasso encourages sparsity and, at the same time, the parameters of the model to be identical across groups. In this way, the groups are encouraged to have both similar network structure and identical parameter values, thereby typically resulting in more parsimonious models. Procedures for applying both the group and the fused graphical lasso are not available for paired data and the symmetric graphical lasso introduced in this paper contributes to fill this gap.
Interestingly, the application of fused graphical lasso for paired data results in a model that belongs to the family of Gaussian graphical models with edge and vertex symmetries, shortly RCON models, introduced by Højsgaard and Lauritzen (2008). Note that, as pointed out in that paper, symmetry restrictions in the multivariate Gaussian distribution have a long history and RCON models can be identified as a special case within this framework. For recent applications of these models see Gao and Massam (2015); Vinciotti et al. (2016); Massam et al. (2018). Although the theory of estimation and testing for RCON models is well-established, a procedure that performs model selection within the family of RCON models is not available, with the relevant exception of the procedures introduced by Gehrmann (2011) and by Li et al. (2020), within the frequentist and the Bayesian approach, respectively, which are of theoretical interest but whose computational complexity restricts their application to low dimensional settings. More specifically, the problem of model selection for RCON model is discussed in Gehrmann (2011) where it is shown that the number of RCON models grows super-exponentially in the number of variables. For this reason, Gehrmann (2011) suggested that lasso procedures with fused type penalties might represent a useful alternative to traditional model selection approaches. The symmetric graphical lasso does not constitute a general solution to this problem but it represents, to the best of our knowledge, the first instance of a lasso procedure specifically designed for RCON models.
A further contribution of the paper is concerned with the prewhitening of fMRI data that, measured at each region, are characterised by temporal dependence. There is a longstanding debate on the dynamic properties of fMRI data and parametric models have been employed along with fully non parametric methods. Autoregressive errors have been considered, see e.g. Worsley et al. (2002), Lindquist (2008) and Zhu and Cribben (2018), as well as fractional noise error processes, as in Bullmore et al. (2003) or change point methods, see Aston and Kirsch (2012). Semiparametric methods and high pass filters are also applied to fMRI data, see Zhang and Yu (2008) and Schmal et al. (2017), who use the Hodrick-Prescott filter as in St. John and Doyle (2015). Lund (2006) deduced that no commonly accepted model for noise in fMRI exists and that regressors may whiten the noise as well as nonparametric smoothing methods. In a Bayesian setting, the relevance of prewhitening has been investigated by Kundu and Risk (2020), who model the temporal covariance under an inverse-Wishart prior. The discussion in the latter paper leads to the overall conclusion that prewhitening is a crucial step yet not fully solved when modeling fMRI data. Against this background, the paper provides a contribution by assessing the impact of different signal extraction methods, applied to the same dataset, to the whitening of the original, temporally correlated, series.

Problem and data description
Structural symmetry of human brain is concerned with anatomical or physiological similarities between the left and the right hemisphere. Otherwise, functional asymmetry is referred to activity-related differences, in a similar way in which left and right hands operate differently, though being anatomically symmetric. As it is related to behavioral differences, functional asymmetry, also known as lateralisation, is usually detected with respect to some specific tasks, the most relevant being connected to language organisation and handedness. Non invasive methods for exploring the brain organisation with respect to lateralisation are electroencephalography, positron emission tomology, and fMRI, the latter being the most used in research, which generally display bilateral activations that contrast with the asymmetric effect of lateralisation.
So far, RfRMI studies have suggested a highly symmetric connectivity. Indeed, along with the recognition of the relevance of analysing the brain at rest, the focus has moved from detecting functional asymmetries to detecting structural symmetries. In some sense, the two methods are complementary, but clearly task-based analyses tend to evidence asymmetries whereas resting state analyses are designed to shed light on symmetric structures. In a recent RfMRI analysis, Raemaekers et al. (2018) focus on differences between hemispheres that are reflected in asymmetric functional connectivity in resting state subjects and recognise that any asymmetries are prone to be relatively minute. They also observe that a direct quantification of the extent of the hemispheric symmetry is missing.
The fused graphical lasso procedure introduced in this paper provides a methodological contribution for analysing functional symmetries between the left and right hemisphere of the brain. We apply our method to a multimodal imaging dataset which comes from a pilot study of the Enhanced Nathan Kline Institute-Rockland Sample project. This project aims at providing a large cross-sectional sample of publicly shared multimodal neuroimaging data and psychological information to support and motivate researchers in the relevant scientific goal of understanding the mechanisms underlying the complex brain system. A detailed description of the project, scopes, and technical aspects can be found at http://fcon_1000. projects.nitrc.org/indi/enhanced/. The pilot NKI1 study comprises multimodal imaging data and subject-specific covariates for n = 24 subjects. Detailed information can be found at http://fcon_1000.projects.nitrc.org/indi/CoRR/html/nki_1.html.
For each subject several information are collected as personal covariates, such as anxiety diagnosis, age, gender, handedness. The fMRI time series are recorded at p = 70 spatial Region of Interest (ROI), clustering anatomically close and functionally similar voxels. The way ROIs are defined can depend on the scope of the analysis or on the design of the experiments and it has implications in fMRI analysis, see the discussion by Poldrack (2007). In our case, ROIs are pre-defined according to the Desikan atlas, see Desikan et al. (2006). For any region, additional information on 3-D spatial locations, hemisphere and lobe membership are available. As we have ROI-specific information, we apply a region-of-interest analysis approach, based on the given anatomical parcellation. An alternative approach is to conduct a whole-brain voxel-wise analysis, on a finer scale, but such an approach is computationally expensive, sensitive to noise, and often difficult to interpret. The optimal means of combining voxels into functionally distinct regions of interest remains to be determined. The issue of parcellation is largely discussed in Craddock et al. (2012), where the authors develop a spatially constrained spectral clustering approach for group clustering of the whole resting state fMRI data into functionally and spatially coherent regions.
As far as dynamic functional activity is concerned, the dataset we are focusing on is composed by time-series data collected for each of the 24 subjects in an imaging session. This imaging technology monitors brain functional activity at different regions via dynamic changes in blood flow creating a low frequency blood oxygen level dependent signal when the subject is not performing an explicit task during the imaging session. In the present NKI1 study, the subjects are simply asked to stay awake with eyes open. Focusing on subject i and on scan k, where i = 1 : 24 and k = 1 : 2, we have 70 x 404 matrix whose rows contain the dynamic activity data of the brain regions, collected at T = 404 equally spaced times (time lag is 1400 ms).
The data are provided by Greg Kiar and Eric Bridgeford from NeuroData at Johns Hopkins University, who graciously pre-processed the raw DTI and R-fMRI imaging data available at http://fcon_1000.projects.nitrc.org/indi/CoRR/html/nki_1.html, using the pipelines ndmg and C-PAC.

Overview on the methodological framework
Let X (t) be a p dimensional time series vector collecting the fMRI series observed on each single subject over p = 70 regions, t = 1, . . . , T where T = 404, the length of each time series. We assume the general signal plus noise decomposition for X (t) , where M (t) is the vector collecting the BOLD signal and Y (t) is the idiosyncratic noise component. Our input data for the analysis of the ROI network association structure will be the estimate of Y (t) , obtained by contrast as the residual vector once the BOLD signal M (t) is extracted (see Section 4.2). More specifically, we denote by V = {1, . . . , p} the set indexing the p = 70 brain regions and by Y V = (Y 1 , . . . , Y 70 ) the zero mean residual vector, where we have dropped the time index as we assume that the time series dynamics are fully captured by the time varying BOLD signal (Section 4.2). We assume Y V ∼ N p (0, Σ) and consider the ROI connectivity network obtained from the application of the theory of undirected graphical models (Lauritzen, 1996). In this framework, the network structure follows from the sparsity pattern of the concentration matrix Θ = Σ −1 . More specifically, if the entry θ ij of Θ, with i = j, is such that θ ij = 0 then the brain regions indexed by i and j are connected by an edge in the network. Conversely, for every missing edge in the network the corresponding entry of Θ is equal to zero. Concentrations can be interpreted by exploiting their connection with partial correlation and regression coefficients, because for every pair i, j ∈ V with i = j it holds that (see Cox and Wermuth, 1996, Section 3.2), where ρ ij|V \{i,j} is the partial correlation between Y i and Y j given the remaining components Hence, if the brain regions indexed by i and j are not connected by an edge it holds that θ ij = 0 and this is equivalent to ρ ij|V \{i,j} = 0 but also to β i←j|V \{i,j} = 0 and to β j←i|V \{i,j} = 0. Furthermore, in this case, Y i and Y j are conditionally independent given Y V \{i,j} . Every region in the left hemisphere has an homologous region in the right hemisphere so that the vector Y V can be naturally partitioned into two subvectors. More formally, we set q = p/2 and let the sets L = {1, . . . , q} and R = {q + 1, . . . , p} index the subvectors Y L and Y R associated with the left and right hemispheres, respectively, so that the region relative to Y i of Y L is homologous to the region relative to Y i+q of Y R ; furthermore, to shorten the notation, we set i = i + q for every i ∈ L. Accordingly, the concentration matrix Θ can be partitioned as We investigate the presence of symmetries in the ROI association network that take the form of identities between concentrations in Θ LL with the corresponding concentrations in Θ RR . This is motivated by the interpretation of such equality restrictions that, by (2), allows one to identify equality relationships involving partial correlation and regression coefficients. Specifically: (i) Equalities involving the diagonal entries imply equality in partial covariances, that is (ii) If in addition to the equality θ ii = θ i i in (i) it also holds that θ ij = θ i j then we have β i←j|V \{i,j} = β i ←j |V \{i ,j } so that the contribution of Y j to the prediction of Y i is identical to the contribution of Y j to the prediction of Y i .
(iii) If in addition to the equalities θ ii = θ i i and θ ij = θ i j in (ii) it also holds that θ jj = θ j j then the partial correlation between Y i and Y j is identical to that between Y i and Y j ; formally ρ ij|V \{i,j} = ρ i j |V \{i ,j } . It is also worth remarking that in this case it follows from (ii) that both β i←j|V \{i,j} = β i ←j |V \{i ,j } and β j←i|V \{i,j} = β j ←i |V \{i ,j } .

Graphical models, graphical lasso and symmetries
We represent the ROI connectivity network by means of an undirected graph G = (V, E) where the vertex set V indexes the brain regions and E ⊂ V × V is a set of edges, which are unordered pairs of vertices. Let Y V be a multivariate normal random vector with zero mean vector, variance and covariance matrix Σ = {σ ij } i,j∈V and concentration matrix Σ −1 = Θ = {θ ij } i,j∈V . The concentration graph model (Cox and Wermuth, 1996) with graph G = (V, E) is the family of multivariate normal distributions with Θ ∈ S + (G), the set of (symmetric) positive definite matrices which have zero elements θ ij = 0 whenever {i, j} ∈ E. The latter model has also been called a covariance selection model (Dempster, 1972) and a graphical Gaussian model (Whittaker, 1990); we refer the reader to Lauritzen (1996) for details and discussion.
be the matrix of sums of squares and products for a sample y subject to Θ ∈ S + (G); see Lauritzen (1996, Section 5.2) for details. On the other hand, the structure of a concentration graph can be estimated from data by determining the zero entries of the concentration matrix. We refer the reader to Drton and Maathuis (2017) for a comprehensive review on structure learning for graphical models.
In recent years, much interest has focused on the estimation of concentration graph models through the use of 1 (lasso) regularization. More specifically, Yuan and Lin (2007), Banerjee et al. (2008) and Friedman et al. (2008) proposed the graphical lasso estimator where minimization is over the set S + of p × p positive definite matrices, λ ≥ 0 and the 1 -norm ||Θ|| 1 is the sum of the absolute values of the elements of Θ. The graphical lasso adds to the log-likelihood function from (3) a 1 -penalty pushing the solutions to be sparse, in the sense that due to the geometry of the 1 -penalty, typically some of the off-diagonal entries of the correlation matrix are shrunk to exactly zero. The term λ is the regularization parameter that controls the amount of shrinkage applied to the elements of Θ, and therefore controlling the sparsity of the solution. Thus, graphical lasso is an effective procedure that conducts model selection and estimation simultaneously. Finally, we remark that for λ > 0 the minimum in (4) is achieved uniquely because the objective is strictly convex, and this holds true also in high-dimensional settings where p > n. Højsgaard and Lauritzen (2008) investigated the properties of subfamilies of concentration graph models, named RCON models, obtained by imposing additional equality restrictions between specified entries of the concentration matrix. RCON models are commonly referred to as colored graphical models because equality constraints can be represented by colouring of edges and vertices of the concentration graph G. Edges of the same color correspond to off-diagonal entries of Θ with identical values, and similarly for vertices with respect to diagonal entries. The model is thus identified by the structure of G together with a collection of color classes. Højsgaard and Lauritzen (2008) showed that, as well as concentration graph models, RCON models are regular exponential families and provided an algorithm for the computation of the MLE of Θ, implemented in the R package gRc (Højsgaard and Lauritzen, 2007).

Time series analysis
To assess the impact of detrending on our procedure, we consider three different specifications for the latent components in equation (1). Each one is representative of a wide class of methods for signal extraction and is based on different assumptions on the latent components and their dependence relation. In particular, we specify a Gaussian vector autoregressive model (Section 4.2.1), a univariate Student-t score driven model (Section 4.2.2) and a local polynomial regression filter, that is the Henderson filter (Section 4.2.3). In the univariate case, we shall denote the elements of the vectors X (t) , M (t) , Y (t) as x (t) , µ (t) , y (t) , respectively.

Vector Autoregressive Models
In the class of multivariate linear models, we consider a first order vector autoregressive process, VAR(1), see Tunnicliffe-Wilson et al. (2015), where and Y (t) is assumed to be multivariate normal with zero mean, covariance matrix Σ and uncorrelated with X s for s < t. The coefficient matrix Φ ∈ R p×p is required to have eigenvalues that are in modulus smaller than one and it is usually estimated by least squares. Under distributional assumptions on Y (t) maximum likelihood estimation can be carried out and for VAR processes of higher order, the latter can be selected by means of information criteria.

Score driven models
Among nonlinear models for unobserved components, we focus on the class of score driven models, recently introduced by Creal et al. (2013) and Harvey (2013) as flexible observation driven models for time varying parameters that characterise a given conditional distribution. Specifically, we consider the first order dynamic conditional score (DCS) model for the location discussed by Harvey and Luati (2014), where each x (t) is assumed to be conditionally distributed as a Student-t random variable with ν degrees of freedom, x (t) |F t−1 ∼ t ν (µ (t) , σ 2 ), with the filtration F s representing the information set up to time s. The signal µ (t) is estimated based on an autoregressive mechanism,μ (t) = ω + φμ (t−1) + κû (t−1) whereû (t) is a realisation of a martingale difference sequence, i.e. E(u (t) |F t−1 ) = 0, proportional to the score of the conditional likelihood of the time varying location, i.e. u (t) ∝ (∂/∂µ (t) ) (µ (t) |F t−1 ), |φ| < 1 andμ (0) is set equal to a fixed value. In this framework, the dynamic BOLD signal is updated by a filter that is robust with respect to extreme values (Calvet et al., 2015). The robustness comes from the properties of the martingale difference sequence u (t) : if the data arise from a heavy tail distribution, then the scoreû (t) is less sensitive to extreme values than the score of a Gaussian distribution or than the innovation errorv (t) = x (t) −μ (t) . An important property of the proposed specification is that it encompasses the Gaussian case, in that the score of the Student-t converges to that of the Gaussian distribution when the degrees of freedom tend to infinity. In practice, if a score driven model is specified when the underlying dataset is in fact Gaussian, a very high value for the degrees of freedom is estimated and a Gaussian model is eventually fitted with the time varying parameter updated through the Kalman filter. The static parameters, ω, ν, φ, κ, σ, are consistently estimated by maximum likelihood and asymptotic standard errors can be derived (see Harvey, 2013;Harvey and Luati, 2014).
It is important to remark that by applying this method, we are taking into account the possibility that the distribution of the input vector, Y (t) , is misspecified, as it is allowed to come from an heavy tailed, rather than Gaussian, distribution.

Local polynomial regression
Filters that arise from fitting a local polynomial have a well established tradition in time series analysis and signal extraction, see Cleveland and Loader (1996). With no parametric assumptions on the error term, the signal is approximated locally by a polynomial of degree d, so that in the neighbourhood of time t, for t = h + 1, · · · , n − h one has, for j = 0, ±1, · · · , ±h, µ (t+j) = β 0 + β 1 j + β 2 j 2 + · · · + β d j d . Using this design, the estimate of the trend at time t is simply given by the intercept,μ (t) =β 0 . Provided that 2h ≥ d, the d+1 unknown coefficients β k , k = 0, . . . , d, can be estimated by the method of weighted least squares (see Proietti and Luati, 2007) which eventually produce the trend estimate at time t as the result of a weighted average, The Henderson filter (Henderson, 1916) arises as the weighted least squares estimator of a local cubic trend, i.e. d = 3, at time t using 2h + 1 consecutive observations. Henderson (1916) addressed the problem of defining a set of weights that maximise the smoothness of the estimated trend, in the sense that the variance of its third differences is minimum. He showed that up to a factor of proportionality, the resulting weights are the following w j ∝ Note that, with local polynomial regression methods, 2h trend estimates are missing, corresponding to the first and last h time points. Even if the latter are not relevant in the present paper, the reader is referred to Proietti and Luati (2008) for estimation of the signal at the boundaries by asymmetric filters.

The penalized log-likelihood
In order to encourage both sparsity in the graph structure and similarity across the two brain hemispheres, we introduce a specific fused-type penalty (Hoefling, 2010;Tibshirani and Taylor, 2011) especially designed to encourage the equality between the concentration values of the relevant subgraphs. Hence, we propose the following estimator of Θ, which we name the symmetric graphical lasso estimator, where λ 1 , λ 2 ≥ 0 are regularization parameters that control the amount of shrinkage. Equation (5) is obtained by adding to the (minus) log-likelihood in (3) a convex penalty function obtained as the sum of two 1 -penalties, i.e. the penalty λ 1 ||Θ|| 1 that, like the graphical lasso, for large values of λ 1 encourages sparsity in Θ sgl , and the penalty λ 2 Θ LL − Θ RR 1 that, for large values of λ 2 encourages the elements of Θ sgl LL to be identical to the corresponding elements of Θ sgl RR (Tibshirani et al., 2005;Danaher et al., 2014). Recall that, as described in Section 3, such equality constraints may, in turn, imply the equality of other quantities of interest, such as regression coefficients and partial correlation coefficients. One of the appealing features of the lasso is that it typically performs model selection and estimation simultaneously. From this perspective, it is worth remarking that the symmetric graphical lasso performs model selection and estimation within the class of RCON models. More precisely, it is suited to identify color classes of the form {θ ij , θ i j } corresponding toθ sgl ij =θ sgl i j , which are of natural interest in the analysis of brain networks.

An algorithm for the symmetric graphical lasso problem
In order to solve equation (5) we use an alternating direction method of multiplier (ADMM) algorithm. A comprehensive exposition of ADMM algorithm can be found in Boyd et al. (2011) whereas we refer to Danaher et al. (2014) and Tan et al. (2014), and references therein, for applications of ADMM to related problems. ADMM is an attractive algorithm for this problem because it allows us to split the optimization procedure into two nested, less involved, convex optimization problems. These can be both solved using suitable ADMM algorithms.
First, we note that the optimization problem in (5) is equivalent to minimize with respect to Θ and Z the quantity − log det(Θ) + tr(SΘ) + λ 1 Z 1 + λ 2 Z LL − Z RR 1 , where Θ and Z are restricted to belong to S + and subject to the linear constraint Z = Θ. We remark that Z LL and Z RR in (6) are the relevant diagonal submatrices of Z. Hence, the scaled form of the augmented Lagrangian can be written as (Boyd et al., 2011, Section 3.1.1), where U is the scaled dual variable and the symbol · F denotes the Frobenius norm, i.e. the square root of the sum of the squared entries of its argument. The ADMM algorithm for the optimization of (7) uses the augmented Lagrangian parameter ρ 1 > 0 as 'step size' and, when the algorithm is at convergence, due to the constraint Z = Θ, the last two terms of equation (7) cancel out and one obtains the solution to (5). ADMM iterates three fundamental steps in order to minimize (7) (see Boyd et al., 2011, equations (3.5) to (3.6)). More formally, we initialize Z 1 and U 1 equal to the zero matrix and for l = 1, 2, 3, . . . the updates for the quantities (Θ, Z, U ) are obtained as (see also Boyd et al., 2011, Section 6.6): (1) Θ l+1 := arg min Θ − log det(Θ) + tr(SΘ) + ρ 1 2 Θ − Z l + U l 2 F ; (2) Z l+1 := arg min The implementation of step (3) is straightforward and in the following we describe steps (1) and (2) in detail.
Step (1) has an analytical solution, with computational complexity given by performing an eigendecomposition of a p × p matrix. More specifically, if QDQ is the eigendecomposition of ρ 1 (Z l − U l ) − S then the solution is given by Θ l+1 := QDQ whereD is the diagonal matrix with ith diagonal entry (d ii + d 2 ii + 4ρ 1 )/(2ρ 1 ) and d ii is the ith diagonal entry of D. Note that the diagonal entries ofD are always positive because ρ 1 > 0 and therefore Θ l+1 ∈ S + , as required. Finally, we remark that this step of ADMM coincides with the corresponding step of ADMM for graphical lasso and the reader can see Boyd et al. (2011, Section 6.6) for further details.
We turn now to step (2) of the algorithm. For a matrix Q with rows and columns indexed by V = L ∪ R we let v(Q) be the vector defined as where vec(·) and vech(·) are the vectorization and half-vectorization operators, respectively. Hence, we set where I is the identity matrix of dimension q(q + 1)/2 and O is the q(q + 1)/2 × q 2 zero matrix. We can thus write the second step of the main ADMM algorithm in the form, arg min where · 2 is the Euclidean norm, λ 1 = λ 1 /ρ 1 , λ 2 = λ 2 /ρ 1 and, to simplify the notation, we have omitted the superscript from b. Equation (8) shows that the optimization problem in the second step of ADMM is a special variant of the classical fused lasso called the fused lasso signal approximator. This allows us to exploit known results for this class of problems. More specifically, Friedman et al. (2007, Lemma A.1) showed that if a solution of (8) for λ 1 = 0 and λ 2 > 0 is known, then the solution for λ 1 > 0 can be easily obtain in closed form through a soft-thresholding operation. Hence, we can focus on the solution of arg min that is a generalized lasso problem (Tibshirani and Taylor, 2011), and an ADMM algorithm for its solution can be found in Boyd et al. (2011, Section 6.4.1). Concretely, the ADMM algorithm iterates until convergence through the following steps: In step (i), I is an identity matrix of appropriate dimension and ρ 2 > 0 is the 'step size' for the inner ADMM. In step (ii), S κ (·) is the soft thresholding operator (see Boyd et al., 2011, Section 4.4.3). The vectors v and t mimic the role of Z and U of the outer ADMM and can be initialized to the zero vector. If we denote by z [λ 1 =0,λ 2 ] the optimal solution at convergence of (9), then we can apply Friedman et al. (2007, Lemma A.1) and adjust z [λ 1 =0,λ 2 ] element-wise to obtain the optimal solution of (8) for a given λ 1 > 0 as z [λ 1 ,λ 2 ] = S λ1/ρ1 (z [λ 1 =0,λ 2 ] ). The update Z l+1 for step (2) of the outer ADMM algorithm is thus the symmetric matrix such that v(Z l+1 ) = z [λ 1 ,λ 2 ] . Finally, as stopping rule we use a tolerance check on the total relative change of the current estimate of the solution. In particular, if ||Θ m −Θ m−1 || ||Θ m−1 || is lower than the chosen tolerance, the algorithm is stopped and assumed to be at convergence.

Simulation study
We carry out a simulation study which aims to assess the performance of symmetric graphical lasso in a framework that mimics the structure of the RfMRI data in Section 2. For this reason, we apply our procedure to simulated datasets sampled from normally distributed random vectors Y V of |V | = p = 70 variables with V = L ∪ R, as in Section 3. We consider two scenarios, denoted by A and B, that differ in their edge and symmetry degrees, with scenario A being sparser than B. The experiment is designed as follows. First, we randomly generate two undirected graphs, G A and G B , with edge degrees, computed as the ratio of the number of edges of the graph over the number of edges of the complete graph, p(p − 1)/2, equal to d A = 23.1% and d B = 31.6%, respectively. Next, for each scenario, we randomly generate 4 positive definite concentration matrices Θ A true,i and Θ B true,i , for i = 1, . . . , 4 with zero pattern corresponding to the missing edges of G A and G B , respectively. The concentration matrices are constructed in order for a given proportion of randomly selected pairs of homologous concentrations across the two brain hemispheres to have the same value. More specifically, we focus on present edges, with nonzero concentration values, and the proportion of pairs of symmetric nonzero concentration is d A sym = 10.8% for scenario A and d B sym = 30.1% for scenario B. The 8 generated concentration matrices characterize 8 normal distributions with zero mean vector, and from each of these distributions we extract 9 i.i.d. samples of size n = 400, so as to resemble the sample size of the data in Section 2.
In the penalized likelihood framework, a controversial question is how to choose the regularization parameter and several methods have been proposed in the literature. This issue is beyond the scope of this paper and, in order to avoid that the performance of symmetric graphical lasso is confounded by the choice of a selection method, we follow an "oracle" procedure, described in the following. In each of the 72 generated datasets, we apply the graphical lasso and choose the value of λ 1 that produces a graph with an edge density equal to the density of the graph used to simulate the data. Next, conditional on the selected value of λ 1 , we apply the symmetric graphical lasso for 10 different logarithmically spaced values of λ 2 . For every selected model, we consider some well established measures to assess the performance in recovering the graph structure. The same quantities are then adapted to assess the performance in recovering the symmetric structure. Specifically, we compute the edge positivepredicted value (ePPV), also called precision, as the ratio between the number of true edges (eTP) and the number of edges (#edges) in the selected graph, and the symmetry positive-predicted value (sPPV) as the ratio between the number of true symmetric nonzero concentrations (sTP) and the number of nonzero symmetric concentrations (#symm) in the estimated concentration matrix. Furthermore, we compute the edge true-positive rate (eTPR) as the ratio between eTP and the number of edges (eP) in the true graph and the symmetry true-positive rate (sTPR) as the ratio between sTP and the number symmetric nonzero concentrations (sP) in the true concentration matrix. Similarly, we compute the edge true-negative rate (eTNR) and the symmetry true-negative rate (sTNR). In this way, we consider the quantities ePPV = eTP #edges , eTPR = eTP eP and eTNR = eTN eN , which we use to asses how much the symmetric graphical lasso procedure recovers the graph structure, and the quantities sPPV = sTP #symm , sTPR = sTP sP and sTNR = sTN sN , used to asses the ability of the symmetric graphical lasso to identify symmetries. It is worth remarking that symmetric graphical lasso tends to encourage equality between both zero and nonzero concentrations but we evaluate its performance only with respect to nonzero concentrations whose identification is of greater interest in applied contexts. Table 1 summarises the behaviour of the symmetric graphical lasso for a dataset in the scenario A. More specifically, we report the performance measures for the model selected by the graphical lasso and each of the 10 models obtained from the 10 values of λ 2 in the application of the symmetric graphical lasso. As shown in the first two lines of Table 1, the results for the graphical lasso and the symmetric graphical lasso with the lowest value of λ 2 are virtually identical, which is expected given the equivalence between our proposed approach and graphical lasso when λ 2 = 0. Increasing values of λ 2 tend to increase the sparsity of the selected graph, which can be explained by the fact that symmetric graphical lasso encourages symmetries also for zero concentrations. As a consequence, increasing values of λ 2 tend to correspond to a moderate decrease in the values of ePPV and eTPR; on the other hand, eTNT tends to increase as λ 2 increases.
To choose the value of λ 2 we adopt the "oracle" criterium that selects the model corresponding to the highest sum sTPR+sTNR, highlighted in bold in Table 1. We note that this corresponds to the sparsest graph, with 496 present edges. Focusing on the symmetric concentrations, we see, as expected, an increase in sTPR for increasing values of λ 2 , with a steady decrease in terms of sTNR, with an appreciable value of 89.92 for the selected model. If we piece together both the considerations, we can see from Table 1 that the price in terms of eTPR and eTNR paid by using symmetric graphical lasso instead of the graphical lasso is worth the additional information we gain by recovering the symmetric structure of the two blocks of the concentration matrix, and the associated graph.
We apply this procedure to the 72 generated datasets thereby obtaining 72 models identified by graphical lasso and 72 models identified by symmetric graphical lasso. The result of these analyses are summarized in Table 2 and Figure 1. These show that, as far as the structure of the graph is concerned, the graphs obtained from symmetric graphical lasso have smaller values of eTPR and ePPV with respect to graphical lasso, and higher values of eTNR. However, the reduction in eTPR and ePPV is moderate in the sparser scenario A, about 10% for eTPR and 2.5% for ePPV, and, in fact, quite small in the denser scenario B, about 5% for eTPR and 1.5% for ePPV. Nonetheless, these moderate reductions in eTPR and ePPV are compensated by an increase in eTNR and, most importantly, by a satisfying performance in terms of recovery of symmetries. Interestingly, the symmetric graphical lasso seem to have a very similar behaviour in the two scenarios as far as sTPR and sTNR are concerned, but scenario B shows higher values of sPPV.

Analysis of RfMRI data
The symmetric graphical lasso is applied to the data described in Section 2. The analysis is focused on two subjects, indexed as subject 18 and subject 22, who show homogeneous characteristics in terms of age and handednenss but different diagnosis status. According to the available explanatory variables, subject 18 is 46 years old, right-handed, and healthy, whereas subject 22 is 42 years old, right-handed as well, but had a current and recurring diagnosis of drug abuse and mental disorders at the time of the fMRI scan recording.
We first account for the temporal dependence, with the tools and methodologies discussed in Section 4.2. In particular, for each subject, we obtain a matrix of residuals of dimension n × p from a VAR(1) model, a first order score driven model, and a Henderson filter with h = 6 (a 13-term weighted average). We then apply the symmetric graphical lasso to the residuals. As criteria for choosing the optimal value of λ 1 and λ 2 we use the Bayesian Information Criterion (BIC) and the extended BIC (eBIC; Foygel and Drton, 2010), where l( Θ mle ) and d denote the maximized log-likelihood function and the number of free parameters of the relevant model, respectively. The eBIC depends on the parameter γ ∈ [0; 1] that controls how much the criterion prefers simpler models. The limit case γ = 0 corresponds to the classical BIC. As suggested in Foygel and Drton (2010) we set γ = 0.5. For the computation of the maximum likelihood estimates within the family of RCON models, we use the gRc package for R (Højsgaard and Lauritzen, 2007). As a joint grid search over λ 1 and λ 2 could be computationally prohibitive (see also Danaher et al., 2014), we fix first λ 2 to a low value -close to zerowhile performing a dense grid search over λ 1 . After selecting the best value of λ 1 , a conditional sweep on a grid of 20 equally spaced values on a logarithmic scale for λ 2 can be performed to select the final pair of optimal values (λ 1 , λ 2 ).
The empirical results of graphical lasso (gl) and symmetric graphical lasso (sgl) fit on the residuals estimates for the two subjects are reported in Table 3 (vector autoregressive model, VAR), Table 4 (score driven model, DCS) and Table 5 (Henderson filter, H13), according to the different filtering techniques. For sake of comparison, we report the results obtained by using both eBIC and its limit value BIC. However, we focus on the models obtained from the minimization of eBIC that are more parsimonious than the corresponding models selected from BIC, in particular for the symmetric graphical lasso. A more direct visual representation of the results detailed in Tables 3, 4 and 5 is summarised in Figures 2 and 3, which provide a graphical representation of the brain symmetry structure. Specifically, the edges of the graphs encode symmetric nonzero off-diagonal concentrations whereas shaded vertices denote symmetric diagonal concentrations. While Figure 2 summarises the results for the two subjects across the three filtering methods, Figure 3 reports the symmetries which turn out to be common to the three methods for subject 18 (left) and 22 (right). Moreover, in every Figure we omit non symmetric edges from the visualization, in order to highlight the novelty aspect of the analysis and also to facilitate the reader with graphs that would otherwise be too densely plotted to be appreciable; nevertheless, the overall edge density for each result are reported in the summary Tables 3 to 5.
The first evident result is that, regardless of the filtering method, subject 22 shows a denser and more symmetric graph than subject 18. Also, the three models selected by eBIC for subject 22 have both similar densities and similar amount of symmetric edges and nonzero concentrations; compare, for instance, the 649 edges of subject 22 and the 373 edges of subject 18 in Table 3, related to VAR estimation. On the other hand, filtering has an impact on subject 18, as it is evident that the three methods induce a different density (from 11.68% of H13 in Table 5, to 21.70% of DCS in Table 4) and a different amount of symmetries in the resulting graph (from the 42 pairs of symmetric edges of H13 to the 123 of DCS). These considerations are ever more pronounced in Figure 2, which shows how the graph associated with local polynomial regression is the one exhibiting the lowest number of symmetric concentrations, both diagonal (shaded nodes) and off-diagonal (black solid lines). The second lowest number of symmetric concentrations is observed for the graph identified from the residuals of a VAR(1) model, whereas the residuals from the score driven model bring to a graph with the highest number of symmetric off-diagonal concentrations. This is in line with the idea that a robust detrending method leaves more information in the residuals; on the other hand, methods that tend to overfit the data, such as an high-degree local polynomial regression, may allocate most of the data dependence structure in the signal component, rather than in the noise. This can be quantitatively assessed by comparing the number of pairs of symmetrical concentrations, both off-diagonal and diagonal, reported as the last two columns of Tables 3, 4 and 5.
Using different filtering techniques also allows one to extract the common information on the symmetric structure across the three filtering methods, by retaining the shared graph of the concentrations for each subject, i.e. the graph resulting from the intersection of the three graphs in Figure 2. After this marginalization, and in order to give a better insight, we juxtapose them in Figure 3. In this side-by-side comparison, subject 22 exhibits an higher number of offdiagonal concentrations, and it is worth noting both subject seems to share approximately the same number of core symmetric diagonal values.
In conclusion, we may envisage two main empirical findings emerging from the analysis. The first evidence is concerned with the fact that the subject with a diagnosis of disorder shows a more symmetric brain structure than the healthy one. This is in line with several studies that are in favour of an evidence of lack of asymmetry in schizophrenic patients, see Sun et al. (2015). Nevertheless, the literature is quite controversial on this theme, see the review paper by Stephane et al. (2001, Section 3.1.4) and our results just refer to a pair of subjects. Secondly, the impact of the detrending method appears to be stronger in a subject who presents a less symmetric brain structure while it seems to be irrelevant in the subject who shows a more defined symmetric pattern. In any case, the combination of different filtering methods may shed light on the core symmetries that characterize the brain of different subjects.

Discussion
The procedures introduced in this paper have been developed with a focus on the identification of brain networks form RfMRI data. In this respect, we have first considered the problem of removing the temporal dependence from data and then we have designed our methods and algorithms to suit the natural partition of the brain into hemispheres. Nevertheless, symmetric graphical lasso identifies a model within the class of RCON models and, as such, it has a potentially wider range of applications. To the best of our knowledge, the symmetric graphical lasso proposed in this contribution is the first instance where the lasso procedure is used to perform model selection within the class of colored graphical models. In this way, we are able to identify symmetries characterized by equality constraints in the entries of the concentration matrix. Methods not explicitly designed to identify symmetric nonzero concentrations, such as the graphical lasso, can still identify symmetries in the graph structure but only in terms of edges being present or absent. Tables 3-4-5 compare the graphical lasso and the symmetric graphical lasso in their ability to identify symmetric edges, and show that our method encourages structural symmetries without affecting the graph sparsity.
Our proposed approach has an associated computational complexity comparable with that of methods using penalties of similar type, such as a conventional fused or group lasso (Danaher et al., 2014). The computational effort associated to using the ADMM algorithm to solve this class of convex quadratic programming optimizations is admittedly the eigen decomposition of a p × p matrix (Tibshirani et al., 2005). In the application considered in this manuscript, the dimensionality of the problem is bounded by the atlases used in resting state fMRI studies: usually, these atlases identify a number of ROIs close to p = 70, as in the data analyzed in Section 5, which makes estimation times of our procedure not a concern. If indeed other applications are considered, such as cancer genomic where the number of variables is in the order of thousands, then exploring some potential pre-screening procedures would be a very interesting avenue to pursue, maybe adapting those mentioned in Danaher et al. (2014) and Yang et al. (2015) to our symmetric penalty. Future research directions involve the specification of a convex penalty function that allows a more flexible specification of color classes, which could affect other sub-components of the main concentration matrix, as well as consider different types of constraints.

Software
The code implementing the ADMM algorithm described in this paper is available at the following GitHub repository: https://github.com/savranciati/sgl.  (10) and (11) from the application of graphical lasso (gl) and symmetric graphical lasso (sgl) to one of the datasets in the simulated scenario A. All values, except #edges and #symm, are reported in percentages and the line in bold corresponds to the model for which sTPR+sTNR is maximal.

Method
Graph  (10) and (11)    Empirical results for graphical lasso (gl) symmetric graphical lasso (sgl) fit on residuals from a VAR(1) model on two subjects. The last three columns provide a description of the symmetric structure, that is the number of pairs of symmetric: (i) edges, (ii) off-diagonal nonzero concentrations and (iii) diagonal concentrations.

Subject
Criterion   Empirical results for graphical lasso (gl) symmetric graphical lasso (sgl) fit on residuals from a local polynomial regression with p = 3 -Henderson filter weights -and h = 6 on two subjects. The last three columns provide a description of the symmetric structure, that is the number of pairs of symmetric: (i) edges, (ii) off-diagonal nonzero concentrations and (iii) diagonal concentrations. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Environment δ ePPV q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Figure 1: Difference between a given performance measure (top panel: ePPV; middle panel: eTPR; bottom panel: eTNR) computed on the model selected by symmetric graphical lasso and the same measure computed on the model selected by graphical lasso, for each of the 8 × 9 datasets. Every boxplot summarises the 9 datasets of the corresponding environment.