Spatiotemporal analysis of event‐related fMRI to reveal cognitive states

Abstract Cognitive science has a rich history of developing theories of processing that characterize the mental steps involved in performance of many tasks. Recent work in neuroimaging and machine learning has greatly improved our ability to link cognitive processes with what is happening in the brain. This article analyzes a hidden semi‐Markov model‐multivoxel pattern‐analysis (HSMM‐MVPA) methodology that we have developed for inferring the sequence of brain states one traverses in the performance of a cognitive task. The method is applied to a functional magnetic resonance imaging (fMRI) experiment where task boundaries are known that should separate states. The method is able to accurately identify those boundaries. Then, applying the method to synthetic data, we explore more fully those factors that influence performance of the method: signal‐to‐noise ratio, numbers of states, state sojourn times, and numbers of underlying experimental conditions. The results indicate the types of experimental tasks where applications of the HSMM‐MVPA method are likely to yield accurate and insightful results.


| BACKGROUND
In cognitive science, our ability to construct theories of cognitive processes has outstretched our ability to adequately test the theories. Cognitive science theories now postulate sequences of processes, potentially progressing in parallel, that take fractions of a second. They postulate distributions on the durations of these processes. They allow for subjects to take alternative solution paths that can lead to the same result. Essentially, each time a subject undertakes to solve a problem they follow a different journey. In contrast to this theoretical richness, the classic measures of cognitive science are very coarse: whether an answer is correct, total time to perform a task, activation in a brain region, and so forth. These measures compress the journey taken on a problem into a single number. There are measures that do try to track the process, notably verbal protocols and eye movements. However, verbal protocols (e.g., Ericsson, 2006) provide descriptions at a temporal grain size much above the speed of cognition and have problems with their reliability. In principle, eye movements (e.g., Kowler, 2011) come closer to the speed of cognition, but this feature is typically not exploited and eye movement data tend to be aggregated into average measures.
Neuroimaging data on the other hand have the potential for illuminating and constraining detailed theories of cognitive processing. Their potential for analyzing moment-by-moment processing has certainly been recognized (Gonzalez-Castillo et al., 2015;King & Dehaene, 2014). While the undertaking is not trivial, there have been advances toward bridging the gap between putative cognitive processes and the underlying neuromechanisms that mediate that processing . Techniques for linking behavioral and neural data and combining modeling data from multiple sources are being developed (Borst & Anderson, 2017;Cohen et al., 2017;Rubin et al., The authors thank Cvetomir Dimov and Qiong Zhang for their comments on the article. 2017; Turner, Forstmann, Love, Palmeri, & Van Maanen, 2017;Turner, Rodriguez, Norcia, McClure, & Steyvers, 2016).
The HSMM-MVPA method is able to take the brain activity over the course of individual experimental trials and parse it into a sequence of unique brain states and corresponding sojourn times. Each state, or brain signature, is simply a brain activation pattern that is roughly constant during the sojourn time. Figure 1 illustrates a recent application of this method to a novel mathematical task (Anderson, Pyke, & Fincham, 2016;Anderson, Zhang, et al., 2016). Each trial corresponded to solving a particular problem, beginning with the problem presentation and ending with the keying of a response. The method found evidence that when solving these problems, subjects traversed four sequential states, each corresponding to a unique brain signature (labeled here as Encoding, Planning, Solving, and Responding). In addition to estimating the brain signatures, the HSMM-MVPA estimates the corresponding sojourn times for these states on each problem. The figure shows the parses of four problems that differed from one another in their cognitive demands, but that happened to all take 14 s (trial times ranged from 9 to 30 s). The different sojourn times for states across these problems reflect the different cognitive demands of each. Aggregating these trial-by-trial parses allowed us to understand how subjects were solving these problems in a way that we would not have been able to had we limited our analyses to total time spent solving the problems or aggregate brain activity over trials. For instance, Anderson, Pyke, & Fincham (2016) and Anderson, Zhang, et al. (2016) showed that duration of the Solving stage varied with the number of additions that participants had to perform.
That study and other related work have demonstrated the usefulness of the technique in analyzing imaging data. However, there are methodological issues that need to be addressed so that this method can be appropriately and successfully applied to imaging data. Consider again the data in Figure 1. How do we judge how well this method has actually captured the true state of affairs in these data? Would some number other than four states better represent the data?
How do we define "better" and how do we evaluate such a claim? In F I G U R E 1 An illustration of the state sojourn times for four trials where different problems were solved. The four problems are given to the left where the arrows denote new mathematical operators that students had learned. Each problem has different cognitive demands, and all happened to take 14 s to solve. In each state, the axial slice (z = 28 mm at x = 0, y = 0 in Talairach space) highlights brain regions whose activation in that state is significantly greater than the average activation during the entire problem. Note brain images are displayed in radiological convention with left brain on the right Source: Anderson et al. (2015) that experiment (and similarly with other experiments in which we have applied the HSMM-MVPA method), individual trials may take up to 30 s (on the order of 1 min in other tasks) to complete. Further, there typically is no clearly defined ground truth from which to begin evaluating the goodness of a particular N-state model. The only verifiable trial markers are stimulus onset and the keying of a response: there are neither experimental task markers nor external cues that indicate when "planning" stops and "solving" begins for example. If there were such task markers, it would be a straightforward matter to measure the goodness of fit between predicted and actual state boundaries (and of course less need to apply the method in the first place, as we could directly analyze brain activity during such well-defined periods).
The purpose of this article is to assess the veridicality of the state divisions that the HSMM-MVPA method produces and to understand the factors that influence the accuracy of the method and usefulness of results. In the first part of this article, we apply the method to an fMRI experiment (Lee, Fincham, & Anderson, 2015) that, unlike the experiment shown in Figure 1, has well-defined task markers that will provide a ground truth from which we can evaluate the goodness of model fits.
In that context we describe the methodology, the notion of informed and uninformed models, and assessing model goodness using several metrics. This will serve to frame the issues important for using the technique to infer useful models.
In the remainder of the article, we extend the experimental results by applying the method to synthetic datasets that have been generated by differing ground truth state structures. We will consider a number of factors that influence how well the method performs, including signalto-noise, numbers of states, state sojourn times, and numbers of conditions. The results will define a "sweet range" for various parameters of experimental tasks where applications of the HSMM-MVPA method are likely to yield both accurate and insightful results.   Lee et al. (2015). In an fMRI scanner subjects alternated between studying a new arithmetic rule to fill in cell of a diagram and solving a problem where they had to apply the new rule (each rule was just applied once). While the textual and graphical complexity was constant, in some cases the critical information was in a graphical example (Figure 2a) of the rule and in others it was in a verbal instruction (Figure 2b) on how to apply the rule. After studying the instruction, participants had to apply it to a problem presented identically for either study condition (Figure 2c). This defined the two conditions of the experiment-Example and Verbal. Figure 2d illustrates the detailed structure of the experiment which we broke into five phases, for which we have recorded onset and offset: 1. Study: Self-paced study ended by clicking a done button.
2. Delay: A fixed 5-s duration repetition-detection task and a halfsecond fixation. In the repetition-detection task, letters appeared on the screen at a rate of 1 every 1.25 s. Participants were told to click a match button whenever the same letter appeared twice consecutively. This was intended to discourage participants from extending their encoding of the instructions and to separate study and solution periods.
3. Solve: Self-paced problem solution ended by clicking a done button. 4. Respond: A self-paced response period that subjects had to enter their answer on a keypad followed by a 1 s feedback. Subjects were encouraged to have the answer ready to key quickly. 5. Detect: A 6-12 s repetition-detection phase followed by the fixation that preceded the next problem. As with the repetitiondetection task in the delay period, letters appeared on the screen at a rate of 1 every 1.25 s. Participants were told to click a match button whenever the same letter appeared twice consecutively.
This was intended to provide a baseline period of activity with minimal cognitive demand.
Adjacent behavioral phases within each experimental trial are distinct from one another in terms of cognitive demands. While a phase may be made up of just one or potentially more successive brain states, each transition from one behavioral phase to another should in The first question of interest is how well the HSMM-MVPA method can do in discovering these behavioral phase boundaries. 1

| Behavioral data
The experiment is described in detail in the original study. Here we will focus on describing the resulting data and its processing. Twenty subjects solved three blocks of 24 problems, equally divided into the two conditions. Looking at correct problems (96.8% in the Verbal condition and 93.5% in the Example) and excluding trials with problematic features, 2 there were 654 Verbal problems and 632 Example problems. Individual trials varied from 18 to 56 s with a mean of 28.3 s and a SD of 4.8 s. Figure  The data processing stream first passes whole brain fMRI activity through standard fMRI processing steps and then converts this into a form appropriate for application of HSMM-MVPA: a small number of orthogonal dimensions of activity where the activity has been deconvolved so that it represents processing at a point in time, which generated the downstream BOLD response. Below we lay out the steps in that processing.

| Imaging processing
Images were acquired using gradient-echo echo planar image (EPI) acquisition on a Siemens 3T Verio Scanner using a 32 channel RF head coil, with 2 s repetition time (TR), 30 ms echo time (TE), 79 flip angle, and 20 cm field of view (FOV). The experiment acquired 34 axial slices on each TR using a 3.2 mm thick, 64 × 64 matrix. This produces voxels that are 3.2 mm high and 3.125 × 3.125 mm 2 . The anterior commissure-posterior commissure (AC-PC) line was on the 11th slice from the bottom scan slice. Acquired images were preprocessed using AFNI (Cox, 1996;Cox & Hyde, 1997). Functional images were motion-corrected using 6-parameter 3D registration. All images were then slice-time centered at 1 s and co-registered to a common reference structural MRI by means of a 12-parameter 3D registration and smoothed with a 6 mm full-width-half-maximum 3D Gaussian filter to accommodate individual differences in anatomy. A functional mask was created using the common reference brain. For each scanning run (60 runs, 3 runs per subject × 20 subjects), in-mask voxels were selected (47,315 in all) for processing. Within each run, BOLD response values were generated for each voxel time series by normalizing each to have mean value of 100. Lastly, signal drift was regressed out of each time series using a fourth-degree polynomial.

| Deconvolution
The primary goal of deconvolution is to take the lagged and diffuse BOLD signal and extract an inferred activity signal for each time point, effectively temporally aligning signal and behavioral data. The resultant time series for each run from the image-processing step was deconvolved by applying a Weiner filter (Glover, 1999) with a hemodynamic response function to produce the estimate of the underlying activity signal for each 2-s time point of the run. The hemodynamic function is the SPM difference of gammas (Friston, Ashburner, Kiebel, Nichols, & Penny, 2011: gamma(6,1)-gamma(16,1)/6). The Weiner filter also requires specification of a noise-to-signal parameter and we used a value of .1, as we have in other applications. The Appendix shows that a wide range of plausible noise-to-signal values yields similar results. l2-normalized, and a spatial PCA was performed. We typically work with the first 20 components and in this experiment the first 20 accounted for 48.9% of the total variance of the deconvolved data. 3

| Principle component analysis
These 20 components were retained and z-scored to ensure all components are of the same scale. Additionally, this transforms the data such that it can be thought of as roughly distributed as multivariate standard normal (though with heavier tails) for the purposes of likelihood computations used by the method. The result is a 20,747 × 20 matrix. Finally, all time points associated with incorrect trials and trials containing greater than 2 time points that had excessive outliers were removed.
The resulting 18,206 × 20 matrix was used in the HSMM-MVPA analysis. This represented 1,286 trials (654 Verbal problems and 632 Example problems) that varied in length from 9 to 26 time points (points 2 s apart) with a mean length of 14.16 time points. This provides the input to the HSMM-MVPA method (see Figure 4a).

| Methodology
The family of models we consider are those that are purely sequential.
There is no branching between states: each state of an N-state model has an associated brain signature and sojourn time distribution, and is guaranteed to follow the previous state 4 (though it is possible to skip states in a sequence). Fitting an HSMM-MVPA model involves estimating a set of parameters that maximize the likelihood of the zscored PCA data. The parameter estimation process uses an expectation maximization algorithm (Dempster, Laird, & Rubin, 1977), starting with neutral parameters 5 and iteratively re-estimating parameters until convergence. There are two sets of parameters (see Figure 4b)one set to specify the distribution of state sojourn times and another set to specify the brain signatures.
1. The distribution of sojourn time in each state. The state sojourn time distributions specify the number of time points that a subject spends in a state. We discretize the continuous gamma distribution to obtain a distribution of times that have the skewed property typical of latency distributions. The probability of spending m 2-s time points (a time point accounts for the 2 s it takes to acquire 1 functional brain volume) in state i is calculated as: where v i and a i are the shape and scale parameters for the continuous gamma distribution for the ith state. . Note that this means that the probability of spending 0 time points in the state is the probability in the continuous gamma of a duration less than 1 s. In these cases, the state is skipped and the model moves on to a successor of that state.
2. The probability of the fMRI activity in the state. The z-scored PCA components are approximately normally distributed. Their zscoring allows us to treat them as independent normally distributed values around the mean values for the state with a SD of 1. The set F j of observed component scores f jk at time j for components k will have probability: where M i is the set of means μ ik estimated for state i.
While the underlying model treats each time point in a trial as in a single state, the estimation process considers all possible ways of F I G U R E 4 An illustration of the HSMM-MVPA method for a 4-state solution: (a) The input is the brain activity on each trial, whose dimensionality has been reduced by a principal component analysis; (b) two types of parameters are estimated from these data: brain signatures that characterize brain activity during a state and durations (sojourn times) which specify the probability that states will last different numbers of time points. With these parameters state occupancies can be calculated, which are the probabilities that a time point is in a state. Summing these state occupancies yields estimates of the state sojourn times for each trial partitioning a trial of M time points into N states. 6 Let m 1 + m 2 + Á Á Á + m N = M be one such partitioning where m i is the number of time points in state i, 1 is the start state, and N is the end state. The probability of this interpretation is The estimation process calculates the summed probability of all such partitionings. This is the probability of the data in that trial. HMM algorithms can efficiently calculate the summed probability using dynamic programming techniques (see discussion of explicit duration HMMs in Yu, 2010). The parameter estimation process seeks to minimize the summed log-likelihood of all trials.
While each partitioning treats a time point as being occupied by a single state, the probability of a state at a time point is the summed probability of all partitionings that assign that state to that time point.
This yields the State Occupancies as illustrated in Figure

| Model selection and fitting
As noted above, the models we are considering are purely sequential.
Nonetheless, such models can be somewhat complex. There are two main decisions that need to be made that impact model complexity and accuracy of the resultant model: identification of the appropriate number of states and whether to use an informed or uninformed model specification.

Number of states
The search for good model fits is currently done using a leave-oneout cross-validation (LOOCV) procedure. Assuming a model with some number of states N, this approach estimates the maximumlikelihood parameters for all but one of the subjects and then uses these parameters to calculate the likelihood of the data from the last subject. This likelihood measures the success of using these parameters to predict the data of the left-out subject. The process is rotated through all the subjects and so can calculate the predicted loglikelihood of the data for each subject assuming the N states.
The data of the all-but-one subjects should be fit better with more states because there are more parameters, but this is no guarantee that more states will predict the data of the left-out subject better. If using more states is just overfitting, the predicted likelihoods will not be better. A simple sign test can be used to identify "better" models by noting whether the number of subjects better predicted is more than would be expected by chance. An N-state model is justified if it predicts better significantly more subjects than models with fewer states. More generally, a model with more parameters is to be preferred over a model with fewer parameters only if it fits significantly more subjects.

Model specification
While number of states is one salient feature determining model complexity, equally important is the number and types of parameters being fit. Given a particular experiment, we can impose various constraints on brain signatures or latency distributions. In the current study, except for the Study Phase, the two study treatments result in similar times and variability (Figure 3) among the remaining behavioral phases of a trial. Therefore, we chose an HSMM that is free to estimate different timing parameters and brain signatures for the first state but constrained to fit the remaining states with a set of common parameters between the two experimental conditions. To illustrate and assess the consequence of this choice, we also fit two other HSMMs that started with less knowledge of the experiment. The least informed model did not know about the two study conditions and rather fit the same time parameters and brain signatures to all trials. A more informed model knew there were two conditions but did not know about where the condition differences might lie and so estimated separate time parameters and brain signatures for all states. These three models are referred to as State-1-Separate, All-States-Shared, and All-States-Separate. The   is not as good as the 7-state solution. As we will demonstrate with synthetic data, sometimes the best-performing model in LOOCV can have more states than actually generated the data.   Figure 7 shows chance performance as a function of number  can only be addressed using synthetic data. First, we will use synthetic data that has properties similar to the experimental data to directly assess the inferred model accuracy for both latency and activation patterns. To paint a more general picture, we will examine results over varying levels of signal-to-noise. Finally, the remainder of the article examines application to a range of synthetic data generated from different ground-truth state structures. This will provide information about the kinds of experiments where we might expect accurate and useful models.

| HSMM-MVPA APPLIED TO SYNTHETIC DATA
Results from the experimental data suggest two main issues that need to be addressed: direct assessment of model derived sojourn times and activity patterns, and determining the best number of states suggested by the LOOCV procedure. With the experimental data, evaluation of model goodness was assessed on how well task phase boundaries aligned with model derived boundaries, and the reasonableness of the state activity patterns. In this section, we transition from the 6-phase experimental data and focus on simulated data with a 6-state ground-truth. Doing so allows us to directly assess the goodness of discovered models by measuring how well models can capture the now known ground truth state sojourn times and activity patterns.
We then consider more general 6-state models, and finally the determination of numbers of states.

| Direct assessment 1: Synthetic experimental data
Building on the experiment results, the next set of analyses uses the 6-state State-1-Separate model from Section 2 as the ground truth, generating synthetic data sets using those activity patterns and sojourn times as ground truth. Note in the case of the real experimental data, there was not ground truth as to the true number of states, but only the knowledge that some of the state boundaries should align with the behavioral phase boundaries. Here the goal is to The t-distribution provides the excess of extreme values. Temporal correlation is created because adjacent noise samples share some t samples in their sum.
The synthetic data, F(t), is created as a weighted sum where SNR is the chosen signal-to-noise ratio. These values are then z-scored and have extreme values truncated to the range −5 to 5 just as the actual data was processed described earlier. This generates a matrix of 18,206 time points × 20 dimensions for the synthetic data like the original data with similar properties. Models can then be fit to the resulting data.
We fit the 6-state model to synthetic data generated with different signal-to-noise ratios from .001 to 1.0. Figure 10 shows the effect of SNR on various properties of the resultant model estimates. Figure 10a,b provide measures that point to a value of SNR that characterizes our experiment. Figure 10a shows how the trial-to-trial variability in estimated state sojourn times decreases with increasing SNR. It drops off sharply from a SNR of .01 to .1, reaching an asymptotic value close to the 1.52 trial-to-trial variability in state sojourn time that we estimate, Figure 10b shows how the computed rootmean-square (RMS) of the estimated brain signatures 11 increases with signal-to-noise ratio. The RMS of the brain signatures increases sharply after a SNR of .1, reflecting how the signal becomes dominated by the brain signatures. The RMS value we obtained for our brain signatures was .265. The circles in Figure 10a Figure 10 show measures of how well the HSMM-MVPA method can recover the ground truth (these are measures that will be reported for other synthetic data sets as well). Figure 10c shows how well the estimated brain signatures correlate with the generating brain signatures as a function of SNR. Figure 10d shows the accuracy in estimating the ground truth state sojourn times, calculated as error in estimate divided by true sojourn time. In both, there is a fairly large loss as SNR drops from .1 to .01. There is good success at recovering ground truth for a SNR of 0.1, which matches properties of the fit to the original data (Figure 10a,b).

Parts (c) and (d) of
3.2 | Direct assessment 2: General 6-state model applied to 6-state ground truth The synthetic data used in Section 3.1 were a rather special case of ground truth, both because they were based on the solution of an HSMM-MVPA and because they reflected the peculiarities of a particular experiment. To get a sense of more general performance, we generated 100 ground truths that were single-condition 12 experiments with 6 states. The sojourn times and brain signatures for these states were loosely in the same range as the experiment and reflected the potential of individual differences: State sojourn times: Each state had a mean sojourn time, M i , selected from a uniform distribution between 2 and 8 s: Averaging over the 100 experiments at each SNR, Figure 11a shows the correlations of the generating brain signatures with the estimated brain signatures. Figure 11b shows the error in estimating the state sojourn times calculated as absolute error divided by true sojourn time of that state. While performance is not as good as the special case in Figure 10, Figure 11 shows relatively good F I G U R E 1 0 (a) Measure of the variability in state sojourn times; (b) measure of the magnitude of the brain signature; (c) correlation between estimated and true brain signatures; (d) accuracy in estimated state sojourn times. 3.3 | Assessing number of states found by LOOCV Figure 11 illustrates fits of 6-state models to data where the ground truth is 6 states. Recall, however, that in the experimental data (Section 2), models with greater than 6 states were discovered, with the maximal-likelihood model having 12-states. While the 12-state model did well capturing experiment phase boundaries, it did not do quite as well as fewer-state models. Here, we consider the issue of models that LOOCV has identified as best in terms of maximal likeli- hood, yet that have greater than the six ground-truth states. In order to understand how such models are to be interpreted and the conditions of the data that can result in solutions with greater than ground truth numbers of states, we simulated 4 sets of 50 datasets, each under different constraints. All datasets were generated similarly to those of the prior section and reported in Figure 11 but with the signal-to-noise ratio fixed at 0.1. The two factors we varied were features that we thought might produce more states than are in the ground truth. First, the temporal correlation in the noise might mean the procedure over-estimates the likelihood of excess states. Second, each partitioning assumes a whole scan is occupied by a state, even though averaging over partitions produces the fractional state occupancy ( Figure 4d). Extra states might be created to handle scans that are mixes of adjacent states. Therefore, the two factors we varied were whether the added noise was temporally correlated or not, and whether the 6 ground-truth states partially or wholly occupied the 2-s time-points that make up individual trials. Figure 12a summarizes the maximal-likelihood number of states gotten from LOOCV for each of the 50 datasets generated with temporally correlated noise where ground-truth states were allowed to partially occupy 2-s time-points. The simulated data in this cell reflect the same construction as that described for simulation results shown in Figure 11. Using a threshold of 16 subjects predicted better (p < .01 under a sign test), each of the 50 synthetic data sets were better fit by models with 7-13 states, the mode being an 8-state solution (32% of the cases).
Examining the brain signatures estimated for the best state solutions selected by the LOOCV method, each of the six generating brain signatures was correlated with one or more of the estimated brain signatures. Extra states had brain signatures related to the original 6 in two ways: Splits: In some cases, the better fitting model had two states corresponding to one of the generating states. The brain signature for the first state was also correlated somewhat with the brain signature for the preceding state while the brain signature for the second state was correlated somewhat with the subsequent state.
Bridges: In some cases, a state typically 1 time point in length, was placed in between two generating states. This state correlated relatively strongly with both of the adjacent states. This state also captured the transition between the two states.
The HSMM-MVPA method assumes that whole states occupy time points even as it considers all ways of partitioning states into sequences of time points. Some time-points will have a mixture of two states and better fits can sometimes be obtained if a state is generated that is a blend of two states. Further, adjacent states include temporally correlated noise as well, also contributing to the need for bridge states. The remainder of Figure 12 shows results from the rest of the simulated data, illustrating the relative contributions of temporally correlated noise and partial occupancy of time-points to discovery of solutions with more states than ground truth. Figure 12b shows results for each of the 50 datasets generated where the noise was not temporally correlated, but where groundtruth states were still allowed to partially occupy 2-s time-points.
When correlated noise is not present, we are less likely to find better higher-state solutions with the mode at 42% 6-state solutions matching ground truth. Figure 12c shows results for the 50 simulated datasets with temporally correlated noise, but where 2-s time points were wholly occupied. Similar to the partial occupancy results in Figure 12a, 98% of the solutions were greater than six states, with a mode of 40% 8-state solutions being best. Finally, Figure 12d shows results for 50 simulated datasets generated with uncorrelated noise and whole time-point state occupancy. For these datasets, one can we generated cases where the ground truth varied from 2 to 18 states, looking only at the cases of an even number of states. The brain signatures and sojourn times for the states were randomly generated as in the previous simulations using again 20 subjects with 100 trials each.
For each choice of states, 100 data sets were generated with a SNR of .1. Figure 13a shows the ability to recover the generating brain signatures and Figure 13b shows the accuracy in recovering state sojourn times. Basically, the accuracy at recovery is high for the beginning and

| State recovery with multiple conditions
Except for the simulations of the data from the real experiment, both data and models have just had a single condition. We simulated two experiments where the sojourn times of some states varied as a function of 4 conditions. Past experiments (e.g., Anderson et al., 2015;Anderson, Pyke, & Fincham, 2016;Anderson, Zhang, et al., 2016), where we have used HSMM-MVPA, have had multiple conditions and we were interested in what states these conditions affect. Again, 100 data sets were generated with a SNR of .1 and the results for these two models are shown in Figure 15: Top panels: There were 4 states and the mean sojourn times of states 1, 3, and 4 did not vary with conditions and were 3, 4, and 5 s, respectively.
State 2 lasted an average of .5, 2.5, 4.5, and 6.5 in the four conditions.
Bottom panels: There were 5 states and the mean sojourn times of states 2, 3, and 4 did not vary with conditions and were 3, 4, and 5 s, respectively. States 1 and 5 either were 1 or 4 s. Crossing these possibilities yielded the four conditions. F I G U R E 1 3 Recovery of ground truth generating models with different numbers of states: (a) Correlations of recovered brain signatures with generating brain signatures; (b) mean proportion error in estimating state sojourn times; (c) proportion (out of 100 synthetic data sets) of convergence to local minima from neutral parameters. SNR = 0.1 F I G U R E 1 4 Accuracy in recovering the parameters of 6-state ground truth generating models with different mean state sojourn times: (a) Correlations of recovered brain signatures with generating brain signatures; (b) mean proportion error in estimating state sojourn times. SNR = 0.1 Figure 15 illustrates the fit of two types of models to each experiment. The less informed models (Figure 15a

| CONCLUSIONS
We have developed the HSMM-MVPA methodology on the assumption that the resultant models will accurately identify brain states and their sojourn times, allowing for meaningful mappings between brain activity and corresponding cognitive processes. The critical output of the process is a parsing of a long trial into a sequence of periods of relative constant mental activity. One can then examine the patterns of brain activity during these periods as well as determine how the sojourn times of these states vary with experimental conditions. This article has been concerned with assessing how well the method does at identifying the states that are driving systematic brain activity over the course of a trial and the factors that affect these results. While much of the presentation was concerned with using synthetic data to identify where the HSMM-MVPA method gave good results, we started out by applying the approach to an experiment that had ground truth about where some of the state boundaries should lie. That ground truth was provided by the five phases of the experiment. The boundaries between these phases were identified fairly well by any HSMM-MVPA that had 6 or more states. Using a LOOCV criterion, there was evidence for up to 12 states ( Figure 5). However, models with fewer than 12 states resulted in better identification of state boundaries (Figure 7).
The best-performing model in terms of the results of an LOOCV can sometimes have more states than actually generated the data, as shown with synthetic data (Figure 12). If one has a firm idea of the number of states that should be present, the question of interest is whether one can accurately identify them in the imaging data. While there is never a guarantee that the peculiarities of one's imaging experiment will not lead to failure, the results were fairly good for SNRs greater than .05. The noise contributing to the SNR will be determined by quality of scanning and factors of experimental control. The signal contributing to the SNR will be determined by how strongly the individual states differ in their brain signatures.
State recovery decreases somewhat as one seeks to identify more states ( Figure 13) and falls of quite sharply for states briefer than 2 s ( Figure 14). There was success ( Figure 15) at identifying states that were brief as half a second in one condition as long as there were other conditions where the states were longer, enabling a good estimate of the state brain signature.
Many cognitive tasks studied by fMRI average less than 2 s and this methodology is not appropriate for these. We have developed complementary HSMM-MVPA methods with EEG, MEG and ECoG that have shown success in parsing such brief tasks into states (e.g., Anderson, Pyke, & Fincham, 2016;Anderson, Zhang, et al., 2016;Anderson et al., 2018;Zhang, Walsh, & Anderson, 2017, Zhang, van Vugt, Borst, & Anderson, 2018. On the other hand, the current results indicate that one can be fairly optimistic analyzing fMRI data from tasks in the 5 s to 1-min range, provided one is not trying to identify a great many brief states. This is where we have had success applying the method to study mathematical problem solving (e.g., Anderson, Pyke, & Fincham, 2016;Anderson, Zhang, et al., 2016). More generally, one might expect success in problem solving and reasoning tasks that typically fall within that time range. These are important tasks in understanding complex human cognition and have been relatively neglected in neural imaging. We hope researchers interested in such tasks will find these methods of use. In using these methods, researchers should keep in mind that the method may split single cognitive states into multiple states ( Figure 12) and theoretical considerations might motivate choosing a smaller number of states.

| Analyses and models
The analyses and models in this article can be obtained at http://actr. psy.cmu.edu/?post_type=publications&p=31445. et al., 2016) where we assume the processing in a state involves the same brain regions in all conditions but might vary in duration with condition.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.

APPENDIX
As discussed in the main body of the article, through a series of processing steps, voxel time series data are transformed into a form suitable for HSMM-MVPA processing. One critical step in this processing stream is to generate an estimate of the underlying "activity signal" the drives the measured BOLD response. Time series data from each scanning run are deconvolved using a Weiner filter (Glover, 1999) with a hemodynamic response point-spread function to accomplish this goal. The hemodynamic response function is the SPM difference of gammas (Friston et al., 2011: gamma(6,1)-gamma(16,1)/6).
The Weiner filter requires an additional parameter that reflects an estimate of the noise-power-to-signal (NSR) ratio in the data. In this article and in other application we have used an NSR parameter of 0.1. Here we analyze the consequences of different choices.
The NSR parameter reflects noise introduced by the measurement of the BOLD signal (the scanner) and subsequent processing steps, and not noise associated with the underlying activity process that yields the BOLD response. There can be a number of possibilities for estimating what might be a reasonable starting point for NSR (e.g., Welvaert & Rosseel, 2013). One possibility is the ratio of variances between a white matter region and a highly responsive brain region. For example, in the current dataset, the ratio of variance of the BOLD signal in a predefined PSPL region to the variance of a predefined white matter region is 0.05. However, it is not clear what the best estimate might be. Here we show that results gotten from an HSMM-MVPA analysis are quite robust over a fairly wide range of values for the NSR parameter that includes an NSR of .1, used in the article.

Parameter exploration
We explored a range of NSR for the Weiner filter from .0001 to 1. For each value of NSR, we recomputed Step 2 of the data processing stream described in the main text, followed by the PCA processing in Step 3. The HSMM-MVPA procedure was applied to each resultant dataset. The measure of performance reported here aligns with evaluation metrics described in the main text: how well we could predict behavioral task boundaries on a trial-by-trial basis. For each NSR, we computed the mean absolute deviations (MADs) between predicted and actual behavioral task boundaries. Figure A1 shows these results for three numbers of states: the 6-state solution which was minimal state solution that performed well at predicting task boundaries, the 7-state solution which was best in terms of prediction error, and the 12-state solution which was best in terms of LOOCV. The behavioral reference line shows how well one could do using information in the behavioral data. It was calculated using the mean position of the behavioral phase boundaries for each set of trials of a particular number of time points (varying from 9 to 26 time points).
It represents the smallest possible prediction error we could achieve using the behavioral data alone. The HSMM-MVPA analysis does not have this information but must discover what constitutes task boundaries and where they are.
The results show relatively good performance in a range from about .005 to .15 with best performance at roughly .05, which corresponds to the variance ratios between a white matter region and highly responsive brain regions. Performance at our chosen value of .1 is only marginally worse. All of these values yield performance well below the behavioral reference value.
F I G U R E A 1 Experiment phase boundary estimate error for 6, 7, and 12 state solutions as a function of NSR used in Weiner filtering during the preprocessing of imaging data. The dashed line Reference shows the smallest possible prediction error when using behavioral data only