Investigating the temporal dynamics of electroencephalogram (EEG) microstates using recurrent neural networks

Abstract Electroencephalogram (EEG) microstates that represent quasi‐stable, global neuronal activity are considered as the building blocks of brain dynamics. Therefore, the analysis of microstate sequences is a promising approach to understand fast brain dynamics that underlie various mental processes. Recent studies suggest that EEG microstate sequences are non‐Markovian and nonstationary, highlighting the importance of the sequential flow of information between different brain states. These findings inspired us to model these sequences using Recurrent Neural Networks (RNNs) consisting of long‐short‐term‐memory (LSTM) units to capture the complex temporal dependencies. Using an LSTM‐based auto encoder framework and different encoding schemes, we modeled the microstate sequences at multiple time scales (200–2,000 ms) aiming to capture stably recurring microstate patterns within and across subjects. We show that RNNs can learn underlying microstate patterns with high accuracy and that the microstate trajectories are subject invariant at shorter time scales (≤400 ms) and reproducible across sessions. Significant drop in the reconstruction accuracy was observed for longer sequence lengths of 2,000 ms. These findings indirectly corroborate earlier studies which indicated that EEG microstate sequences exhibit long‐range dependencies with finite memory content. Furthermore, we find that the latent representations learned by the RNNs are sensitive to external stimulation such as stress while the conventional univariate microstate measures (e.g., occurrence, mean duration, etc.) fail to capture such changes in brain dynamics. While RNNs cannot be configured to identify the specific discriminating patterns, they have the potential for learning the underlying temporal dynamics and are sensitive to sequence aberrations characterized by changes in metal processes. Empowered with the macroscopic understanding of the temporal dynamics that extends beyond short‐term interactions, RNNs offer a reliable alternative for exploring system level brain dynamics using EEG microstate sequences.


| INTRODUCTION
Four quasi-stable states explain consistently around 80% of total topographic variance in spontaneous electroencephalography (EEG).
These states are referred to as microstates and have been suggested to be the "building blocks of brain functions" (Khanna, Pascual-Leone, Michel, & Farzan, 2015;Koenig et al., 2002;Lehmann & Michel, 2011;Michel & Koenig, 2017). Recent studies show that changes in the properties of microstates (e.g., mean duration) are associated with neuro-psychiatric disorders (Michel & Koenig, 2017), for example, schizophrenia (Andreou et al., 2014;Lehmann et al., 2005;Rieger, Diaz Hernandez, Baenninger, & Koenig, 2016), depression (Damborská et al., 2019;Strik, Dierks, Becker, & Lehmann, 1995), epilepsy (Pittau, Baldini, Tomescu, Michel, & Seeck, 2018), as well as stages of development (Koenig et al., 2002). The continuous time course of microstate appearances exhibits long range dependencies over at least six dyadic scales (Van de Ville,  and is interrelated with some of the well-known blood oxygen-level dependent (BOLD) resting state networks Rajkumar et al., 2018;Van de Ville et al., 2010), effectively associating them, among others, with visual, auditory, and attention processes (Milz et al., 2016;Seitzman et al., 2017). Therefore, studying the trajectory of microstates which represents the whole brain dynamics, and seems to be governed by distinct but interconnected processes, is a promising venue to investigate the brain dynamics at the system level.
However, EEG trajectories are rarely investigated by proper mathematical tools capable of modeling the dynamics in a sequencepreserving way. More recently, modeling techniques based on hidden Markov models (Gschwind, Michel, & Van De Ville, 2015), random walk (von Wegner, Tagliazucchi, Brodbeck, & Laufs, 2016), and stochastic process (von Wegner, Tagliazucchi, & Laufs, 2017) are gaining momentum to investigate the transition properties of microstates.
Such approaches are nevertheless limited in terms of their dynamical richness (Gschwind et al., 2015). A fundamental issue with the transition matrix approach is the combinatorial increase of the number of possibilities when the length of sequences is extended, which leads to a sharp decrease in the reliability of models. Consequently, adequate modeling of microstates sequences needs to reach beyond simply the sequence of labels and should consider temporal relations within and between states (Gschwind et al., 2015).
In this article, instead of trying to explicitly model the temporal dynamics of EEG microstates, we ask if there are any temporal patterns in the sequence of EEG microstates that can be reliably and reproducibly detected. This question is best addressed using recurrent neural networks (RNNs) that are known to be a rich and flexible methodology to learn complex temporal dependencies without making any assumption on the temporal characteristics of the signal. Deep neural networks with a recurrent structure (i.e., RNNs) have been used successfully to model various temporal sequences (Cho et al., 2014;Sutskever, Vinyals, & Le, 2014). Unlike conventional feed-forward neural networks that consider all samples to be independent, RNNs have loops with a chain-like structure that dynamically engage information learned from the past to be used for future samples and therefore, have been employed to understand videos and temporal data with promising results (Cho et al., 2014;Venugopalan et al., 2014). Specifically, an RNN consisting of long-short-term-memory (LSTM) cells has been proven to be successful in modeling temporal dependencies in sequential data (Hochreiter & Schmidhuber, 1997;Sak, Senior, & Beaufays, 2014). With short-term memory that can last for a specific period of time, LSTM is well-suited to process, analyze, and predict sequential data with unknown time lags and durations.
Therefore, LSTM networks seem to be suitable for modeling patterns in EEG microstates that are quasi-stable and transient. While sequence-to-sequence autoencoders (AE) have been employed successfully in several tasks such as machine translation and video-totext (Cho et al., 2014;Venugopalan et al., 2014), to the best of our knowledge, there is no previous work on extracting representations from EEG microstate sequences using this method.
The proposed model aims to learn the underlying patterns that exist in EEG microstate sequences where these states exhibit quasistability. Use of RNNs alleviates the need to predefine features which aids in learning potential nonlinear microstate dynamics directly from the data. LSTM architecture goes beyond the step-by-step short-term interactions (modeled by conventional methods like Markov Chains) to capture potentially existing long-range dependencies. Specifically, we explore the temporal dynamics of the microstate sequences using an LSTM-based AE neural network, that is, trained to reconstruct its input to the output, by first compressing the input into a latent-space representation and then using this representation to reconstruct the output. Effectiveness of the proposed model in representing the complex dynamic structure is demonstrated through accurate reconstruction of microstate sequences. Further, we try to study the patterns learned by the RNNs by visualizing LSTM cells that react to specific patterns in microstate trajectories to gain intuition into the internal learning mechanisms and the patterns of EEG microstates that govern the temporal structure of microstates. We show that the model reacts to the changes in microstate sequences after stress induction and that it has the ability to categorize stress-induced condition from baseline microstate sequences which further demonstrates the potential of the learned representations for applications such as classification and cross-modality estimations. Finally, as RNNs have the sequence prediction capability, we attempted to forecast future states in the microstate sequence by combining the historical sequence information with the learned internal representation. Relatively low prediction accuracies beyond a few milliseconds corroborates the nonstationary nature of the resting state microstate sequences due to irregular structure of microstate durations.

| Data acquisition
Here, we used two datasets to assess the temporal structure of EEG microstates. The first dataset is from a study which employs EEG data with simultaneous Functional magnetic resonance imaging (fMRI) where we have access to the resting state recorded pre and post stress condition which is of interest to study its effect on temporal properties of EEG microstates. Furthermore, possible shortcomings of EEG artifact correction recorded inside magnetic resonance imaging (MRI) scanner were then controlled using the second dataset which is recorded outside MRI scanner.
Dataset 1A: Data were obtained from simultaneous 3T EEG-fMRI recordings of 12-min eyes-closed resting-state of 34 healthy male volunteers (mean age 44.06 ± 9.96). EEG data were acquired using the BrainAmp MR system (Brain Products) with a 64-channel Easycap augmented with six carbon-wire-loops (CWLs; van der Meer et al., 2016). One channel placed on the back was used for electrocardiogram (ECG) detection. FCz was used as reference electrode and CPz as ground electrode. The sampling rate was 5,000 Hz Dataset 1B: Data were obtained from simultaneous 3T EEG-fMRI recordings during a 12-min eyes-closed resting-state following a psychosocial stress paradigm in 34 male subjects (mean age = 43.26 ± 10.19). EEG and fMRI data were acquired with the same procedure as in Dataset 1A with the only difference that resting-state data was recorded after subjects underwent a psychosocial stress paradigm.
The stress paradigm was an adapted version of the ScanSTRESS task as described by (Streit et al., 2014) where subjects were asked to perform two tasks containing mental rotation and arithmetic calculation.
For both types of tasks, control blocks were without any social evaluative feedback, time pressure, or difficult questions and stress blocks were with feedback about the correctness and speed of the answers as well as more demanding questions. During the whole experiment, subjects were exposed continuously to a video feed of the reactions of two panel members, who were passive during the control blocks but reacted disapprovingly to the participant's performance during the stress condition. In total, there were four 40-s blocks of each type (arithmetic control/stress and mental rotation control/stress) presented in two runs. After the first run (consisting two blocks of each condition), the experiment was interrupted for an extensive, negative, verbal feedback from the panel members stating that the performance was poor and more effort necessary or the data would not be usable Dataset 1 was recorded in 39 subjects who underwent baseline and post psychosocial stress recordings in the scanner twice, that is, on 2 days, where there was a period of 7-35 days between 2 days. The EEG acquisition was performed adjacent to a clinical trial (NCT02602275) where after the baseline measurements subjects took either placebo or an herbal medicinal product (Neurexan) in a counterbalanced order. Participants who received placebo on the first day took Neurexan on the second day and vice versa. This article only reports results analyzed for the days on which placebo was taken. The data from five subjects in each resting state were excluded from the current analysis because of the problems with recording and/or the low quality of their available EEG data.
Dataset 2 EEG data were obtained in a shielded cabin from 15-min eyes-closed resting-state in 11 healthy male volunteers (mean age = 24.42 ± 3.05). EEG data were acquired using a BrainAmp (Brain Products) with 64-channel actiCap. Two bipolar electrodes placed on the right and the left arm were used for ECG detection and two further bipolar electrodes with galvanic skin response-module input were used for skin conductance measurements. FCz was used as reference electrode and AFz as ground electrode. The electrode PO10 was used for eye movement detection and was excluded from the further analysis. The sampling rate was 2,500 Hz, but data were down-sampled to 1,000 Hz.

| Data preprocessing
Artifact rejection for EEG data was done in a semiautomatic process using custom MATLAB scripts. First, the raw EEG data was bandpass filtered between 0.3 and 200 Hz. Then, EEG was cleaned from MRI gradient artifacts by motion informed template subtraction technique (Moosmann et al., 2009). Then, the helium pump and ballisto-cardiac artifacts were removed using the CWLs artifact correction technique (van der Meer et al., 2016). Finally, the data were segmented into 2 and 1 s trials, for Datasets 1 and 2 respectively, and the trials containing muscle and head movement artifacts were removed from the dataset. Data segmentation was necessary to allow removal of the data containing muscle and head-movement-related artefacts.
Because Dataset 1 contains MRI artefact, we aligned EEG segmentation to the repetition time of the simultaneous MRI scanner. Dataset 2 was recorded outside the scanner and therefore we were able to use a finer segmentation (here 1 s), that allows us to be more precise in the artefact removal and keep more data.
The channels that contained too many epochs with artifacts were also removed and interpolated using routines provided by EEGLAB (Delorme & Makeig, 2004). The latter step of the artifact rejection process also includes independent component analysis decomposition of the EEG data and removing the components that reflected eye movements, continuous muscle activity, and residual MRI-artifacts.
Note that since Dataset 2 was recorded outside of the scanner, we only applied the non-MRI part of the pipeline on this dataset.

| Microstate extraction
To extract EEG microstates, we used the EEGLAB plugin developed by Thomas Koenig (www.thomaskoenig.ch/index.php/software/). Artifact-free EEG was band-pass filtered between 1 and 40 Hz, down-sampled to 250 Hz, and the peaks of the global field power (GFP) were determined after convolving the GFP time series with a Gaussian filter of 10 time-points window length. We use GFP to assign the microstate labels as they are suggested to be the best representation of instantaneous EEG topographies (Koenig, Studer, Hubl, Melie, & Strik, 2005). All maps marked as GFP peaks were extracted and submitted to a modified k-means clustering algorithm to deduce the four classes of map topographies that maximally explain the variance of the map topographies. These four classes of map topographies were then submitted to a full permutation procedure to compute mean classes across participants. Full permutation procedure is a permutation algorithm that is dedicated to maximize the common variance over the subjects. This is done in an iterative procedure by swapping individual microstate topographies for best fit of the prototype maps and updating the prototypes by calculation grand average over subjects (see Koenig et al., 1999 for further details). Using the mean microstate classes across subjects as templates, for all participants the EEG topographies at the local maxima (peaks) of the GFP were assigned to one of these four microstate classes based on maximal Pearson correlation (see Figure 1). Time points between GFP peaks were assigned to the microstate class of the temporally closest GFP peak. Successive maps assigned to the same class were recognized as belonging to one microstate. Finally, the temporal dynamics of microstates are conventionally quantified in terms of the average duration of microstates each time they occur (i.e., Duration), the number of times they occur in a second (i.e., Occurrence), and the proportion of time spent in each microstate (i.e., Contribution; Khanna et al., 2015).

| Reconstruction of microstate sequences using RNN
We use a recurrent sequence-to-sequence AE framework (see The encoder processes the input microstate sequence, I 1 , I 2 , Á Á Á, I T of length T and summarizes the observed temporal pattern in the form of latent representation (hidden state). The task of the decoder is the state-by-state reconstruction of the input microstate sequence.
The reconstruction is based on the latent representation learned by the encoder RNN which is used to initialize the hidden states of the decoder RNN (see Figure 2).
In this work, we employ an architecture similar to the LSTMbased AE that was first employed by Srivastava, Mansimov, & Salakhudinov (2015) to learn representations of spatiotemporal information in video sequences. As illustrated in Figure S1, each LSTM unit has a memory cell and a set of gates that control the flow of information. A chain of such LSTM units is organized into an Encoder-Decoder architecture (see Figure 2) to learn the latent representation of microstate sequences that not only summarizes the high-level patterns contained in the microstate sequences but also learns the temporal dependencies between subsequent states.
In particular, the LSTM-based AE is fed with microstate sequences to encode temporal patterns that are stable across and within subjects. As EEG microstate is a categorical variable with labels A to D, one hot encoding is used to first convert it into numerical form. Here, each label is mapped to a binary vector with a single nonzero entry (1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1) such that the pairwise distances between all microstates are the same. At each step, the network receives a sequence of EEG microstates and generates an output sequence of the same size. EEG sequence is sliced into segments of size S that represents the length of the sequence in time points and input to the LSTM network has dimensions of S × 4 due to one hot encoding.

| Intermediate representation
Technically, only four symbols are enough to represent each of the microstates where consecutive appearances of a microstate represent its local persistence in a sequence. However, duration of a microstate is very irregular and ranges from 0 to 300 ms. As persistence could be a dominating factor that overwhelms the training process and is also shown to be an important encoding feature of microstates (Khanna et al., 2015), alternative encodings were considered to include this information in the samples. The most compact representation that encodes the temporal information would be to encode each microstate along with its persistence as a unique symbol as shown in Figure 3. However, due to the heavy-tailed distributions of EEG microstate durations (Gschwind et al., 2015), such condensed representation will lead to a large number of symbols in the new representation which will make it too sparse compared to number of instances and therefore, cannot reflect the distribution characteristics of the sequence correctly. For example, although microstate B occurs with duration of 100 ms very rarely, it still gets a unique symbol under condensed representation. To overcome this issue, we adapt an intermediate encoding scheme (as shown in Figure 3) that represents a compromise between the original and compact representations.
Although many such intermediate representations are possible, we chose to encode microstates persisting up to four time points with different symbols (A1-A4, B1-B4, etc.) and string them in decreasing order (e.g., D9 is coded as D4 D4 D1). This representation offers a tradeoff between recurrence and sparsity as the number of symbols only increases from four to sixteen while sustaining the notion of perseverance. Importantly, this allows to dissect the possible effect of limited memory capacity of the RNNs from intrinsic temporal structure of EEG microstates should the reconstruction accuracy differ for various microstate sequence durations.

| Generation of surrogate data
To further validate that the proposed LSTM networks are in fact learning the underlying patterns in EEG sequences, the experiments were repeated on two types of surrogate datasets. First, random sequences of microstates were generated where each state was F I G U R E 2 Encoder-Decoder architecture of the long-short-term memory (LSTM) network. The architecture employs two LSTM networks called the encoder and decoder. The encoder is a single layered recurrent neural networks (RNN) with N u units of LSTM. At each time step, the hidden state of the encoder is updated based on the input microstate. Therefore, the final hidden state of the encoder RNN contains information about the whole input sequence. This final hidden state is used by the decoder RNN to recreate the original sequence by minimizing the reconstruction error. The decoder RNN is structurally similar to the encoder with same number of layers and LSTM units F I G U R E 3 Different microstate representations that were used to encode long-term dependencies. Original representation is the most intuitive encoding scheme and requires only four symbols. On the other extreme, condensed representation is sparse but also minimalistic. Intermediate representation provides a tradeoff between these two by assigning different symbols for microstate durations up to four time points and arranging them in decreasing order. With only fourfold increase in the number of symbols, intermediate representation strikes a balance between recurrence and sparsity independently sampled with equal probability of occurrence. Second, for a more realistic imitation, a sequence of microstates was modeled as a discrete autoregressive process (Jacobs & Lewis, 1983), where the state at time t is a function of previous states and thus introduces correlations. In this case, the microstate sequence is assumed to be DAR(p) as x n = V n x n −An + 1− V n ð Þ y n where x n ϵ (A, B, C, D) is the nth state in the sequence, V n is a Bernoulli process taking value of 1 with probability ρ and 0 with probability (1 − ρ). A n is an integer between 1 and p attaining each value with probability α i , and y n is another random process with independent and identically distributed probabilities of selecting a particular state, represented by marginal distribution π.
The parameters of the autoregressive process are estimated from the original EEG microstate sequences by first mapping the symbolic states onto a set of numerical values.

| Reconstruction of microstates sequences within each subject
We first analyzed the Datasets 1A and 2 within an intrasubject framework where data from each subject was separately used for recon- Then each set was further divided to overlapping segments of preset length and then randomly shuffled. We repeat this procedure for the intermediate representation as well. The performance of the models is measured using reconstruction accuracy which is defined as: Reconstruction Accuracy = number of correctly predicted microstates length of the sequence #of time points ð Þ The mean reconstruction accuracies across all subjects are depicted in Figure 4. Note that because within a microstate, periods between GFP peaks were labeled using a nearest neighbor criterion, within an average of about half the average GFP peak-to-peak interval, EEG time-points receive the same microstate label by definition.

| Analysis of surrogate data
Reconstruction accuracy for random sequences is significantly lower than real EEG sequences (Figure 4). While the accuracy for completely random sequences (with equal probability of occurrence for each microstates) is close to chance that is, 25%, accuracies for autocorrelated random sequences is slightly higher.

| Comparison of microstate sequences across subjects
Here, further analysis is done to test the extent to which microstate sequences are comparable across subjects. To do this, intersubject analysis is conducted where microstate sequences extracted from 80% of the subjects were used to train the RNNs and data from the remaining 20% of the subjects were used for testing the performance of the trained models with fivefold cross validation. Figure 4

| Visualization and Interpretation of LSTM Cells
RNNs, composed of a large number of individual cells combined in complex ways to solve challenging tasks, are still majorly black boxes.
With proliferation of large-scale neural networks, interpreting them has become one of the most challenging and active areas of research (Karpathy, Johnson, & Fei-Fei, 2015 Although many LSTM cells were too complex for visual interpretation, we were able to find multiple interpretable cells that robustly identified high-level patterns. Remarkably, some of these patterns correlated with transition dynamics of EEG microstates. For example, Figure 5 depicts the activation of a cell that is sensitive to the occurrence of State A ( Figure 5) and another cell that was found to track transitions to and from States C and D ( Figure 5). For the purpose of visualization and ease of interpretation, these RNNs were trained with microstate sequences of length 100 in original representation.
For a more objective interpretation of these cells, we looked at how the activations were correlated with the temporal dynamics of the sequences. As a measure of state transition probability, average activation value for each transition (across a batch of sequences) for each cell N is calculated as.
where A represents the cell's activation or intensity and n is total number of transitions that occur from a state X to state Y. Average activation value for each state is also calculated as a measure of rate of occurrence of a particular microstate, that is S N:X = 1 n X n i = 1 A Xi : Figure 6 depicts the activation metrics calculated for the two cells presented in Figure 5. The visualizations clearly show the responsiveness of LSTM units to multiple patterns. As can be seen from Figure Figure 6 indicate the specialization of that cell in tracking transitions to and from States A and B toward States C and D.

| Effect of social stress on trajectories of microstates
Several studies have shown that temporal dynamics of the EEG microstate sequences are altered due to disturbances of mental processes F I G U R E 7 Block diagram of joint stress classification along with reconstruction. The reconstruction architecture is same as the one used for reconstruction but here we have coupled the system with pattern classifier which classifies the two conditions. Here, we back propagate both reconstruction and classification error at each epoch. ing state data following an exposure to a psychosocial stress task compared to baseline resting state. The stress response to the psychosocial stress is induced using the ScanSTRESS paradigm (Streit et al., 2014), which uses arithmetic as well as mental rotation tasks (see Section 2.1 for details).
Here, we hypothesize that since stress suppresses certain modes of activity in the brain (Olver, Pinney, Maruff, & Norman, 2015;Sandi, 2013;Yu, 2016), resting state data before stress condition (baseline resting state) should have a richer repertoire of microstate sequences compared to the resting state following the stress task. To test this hypothesis, we tested the generalizability of the RNNs to classify the sequences as stress or nonstress. As shown earlier (see Section 3.3), the time course of microstates seems to be comprised of multiple time scales where shorter time scales are subject-invariant. Therefore, we repeated this experiment with four different sequence lengths of duration of 200, 400, 800, and 1,600 ms, respectively. We used an RNN Encoder-Decoder coupled to a pattern classifier as shown in Consistent with the theory that functional brain states are suppressed under stress conditions, we observe that RNNs encodings obtained from joint training can distinguish between the EEG with or without preceding stress condition. Interestingly, the plot shows a clear trend of increasing accuracy with increasing sequence lengths and thus, emphasizes the importance of long-range correlations in characterizing these sequences. Importantly, the effect of stress was not significant in any of the conventional measures (see Figure 8).
Moreover, we attempted to classify stress versus nonstress for 1,600 ms sequences using simple neural network with conventional F I G U R E 8 Transition probabilities, average duration, coverage, and occurrence frequency of microstates for baseline and stress conditions are compared. These results show that there are no statistically significant differences in any of the features between the two conditions measures as input features. With architecture similar to that used for joint stress classification resulted in an accuracy of approximately 52% only.
To further illustrate the sensitivity of LSTMs to the trajectories of microstates, we conducted a cross-conditional analysis. We tested the generalizability (Olver et al., 2015) of the RNNs when trained on these two datasets (with and without preceding stress) and tested on the other one. The summary of the results is shown in Appendix E. We observed that although when tested on the same session that the model was trained on, the two models have a similar performance (p < .43), the test accuracy of the model trained on data with preceding stress is significantly (p < .023) lower when tested on the session with no preceding stress. Interestingly, this drop in the accuracy is only observed when we cross tested the two models trained on the longer sequence length of 800 ms. These results suggest that RNNs trained on EEG data following the stress condition have less ability to generalize to data with no preceding stress and thus, further substantiating the idea of suppressed brain states under stress conditions.

| DISCUSSION
Microstates denote the quasi-stable topography of scalp-EEG that remain constant for approximately 80 ms and are believed to be the building blocks of adaptive chain of neuro-cognitive states. Therefore, the trajectories of microstates are expected to follow certain (yet unknown) patterns and hence be trackable. Previous studies on EEG microstate transitions has indeed shown that the matrix of transitions among microstates is nonrandom, asymmetric, and systematically affected in different neuropsychiatric disorders (Lehmann et al., 2005;Nishida et al., 2013). Here, we used RNNs to capture the dynamics of microstate transitions with high temporal resolution. We capitalized on the ability of RNNs to recover consistent patterns of EEG microstates and provided insights about the temporal characteristics of the microstate sequences at different sequence lengths. Extensive experiments demonstrated that RNNs can capture the underlying structure of microstate sequences with high accuracy.
We observed that microstate trajectories are largely subjectinvariant at short time scales (≤400 ms) and that the reconstruction accuracy of the model decreases gradually with increasing sequence lengths. In particular, our results suggest that for longer sequence lengths (2,000 ms), the accuracies for either intra or intersubject reconstructions do not differ significantly from random autocorrelated sequences. Interestingly, this approximation is in line with both (Gschwind et al., 2015) and (von Wegner et al., 2016) that estimate the long-range memory effects to last up to 1,000 ms in EEG microstate sequences, suggesting the existence of long-range correlations with finite memory content. Moreover, the increasing trend in the classification accuracy of stress condition from 200 to 1,600 ms further emphasizes the importance of long-range dependence in characterizing the condition-specific features of these sequences. Collectively, these results suggest a multiscale temporal dynamics of microstate sequences where microstate sequences at shorter time scales are subject-invariant and therefore possibly reflect mainly the primary sensory information processes rather than the high-level cognitive processing which are likely to be coded with longer sequences.
This observation is in line with recent findings relating appearance of each microstates to specific functional networks and that temporal characteristics of certain microstates can be manipulated by certain tasks Britz, Diaz Hernandez, Ro, & Michel, 2014;Seitzman et al., 2017). Taken together, these results provide converging evidence for the complex relation between microstates functional relevance and their sequences further consolidating the proposal that microstates are the building blocks of sequences which manifest brain cognitive communications (Michel & Koenig, 2017).
The bursting behavior and long-range temporal dynamics of EEG microstate sequences have been elaborately demonstrated in works such as (Gschwind et al., 2015). Our preliminary experiments confirmed that our datasets exhibit similar properties (Appendix F). While long-range dependencies play an important role in effectively encoding a given sequence as part of both reconstruction and prediction, the task of a predictor is significantly more challenging as it needs to forecast future microstates based on the past information.
We hypothesized that burstiness of the sequences increases the chances of incorrect prediction causing the predicted sequence to rapidly diverge from the original.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article. APPENDIX C: EEG MICROSTATE SEQUENCE

PREDICTIONS
As in the reconstruction model, encoder RNN analyzes the pattern underlying the past microstate trajectory using LSTMs but here, the task of the decoder is modified to forecast the future states.
The encoder passes the learned representation to the decoder which is used to initialize the state of decoder module for sequence prediction. After being initiated with a dummy input at the first step, the decoder recursively generates the output sequence O 1 , O 2 , Á Á Á, O T 0 of desired length T 0 . Again, the decoder used in prediction is conditional in nature. At every step, the decoder feeds the output O t − 1 obtained in the previous step as the input for the current update. The motivation to use conditional decoding is two folds: first, it allows the decoder to learn multiple target sequence distributions (Srivastava, Mansimov, & Salakhudinov, 2015), which is a necessary condition since more than one target can exist in a given input sequence, and second, data has strong short-range correlations which are best modeled by a conditional predictor. We trained this model using microstate sequence length of 100 (400 ms) for each subject to predict the next 400 ms (See Figure S3). For intermediate representation, as one time-step prediction is not possible, we predicted for 1 and 5 timesteps corresponding to approximately 20 and 80 ms. Additionally, there is a gradual decrease in the prediction accuracy as the length of the predicted sequence increases but it remains stably above chance level (see Appendix B).

APPENDIX D: ABLATION-PRELIMINARY RESULTS
The performance of an LSTM network depends on a number of parameters inherent to the architecture. Due to limited computational resources, exhaustive ablation studies could not be performed. However, preliminary sensitivity analysis (See Figure S4) was conducted by varying the number of hidden layers and number of units per layer.
These findings suggest that lower reconstruction accuracies for longer sequence lengths is due to the inherent nature of the EEG sequences rather than a limitation of the LSTM network parameters.

APPENDIX E: CROSS-CONDITIONAL ANALYSIS
See Supporting Information Table S1.
APPENDIX F: BURSTINESS AND LONG-RANGE DEPENDENCE See Figure S5.