Decoding and mapping task states of the human brain via deep learning

Abstract Support vector machine (SVM)‐based multivariate pattern analysis (MVPA) has delivered promising performance in decoding specific task states based on functional magnetic resonance imaging (fMRI) of the human brain. Conventionally, the SVM‐MVPA requires careful feature selection/extraction according to expert knowledge. In this study, we propose a deep neural network (DNN) for directly decoding multiple brain task states from fMRI signals of the brain without any burden for feature handcrafts. We trained and tested the DNN classifier using task fMRI data from the Human Connectome Project's S1200 dataset (N = 1,034). In tests to verify its performance, the proposed classification method identified seven tasks with an average accuracy of 93.7%. We also showed the general applicability of the DNN for transfer learning to small datasets (N = 43), a situation encountered in typical neuroscience research. The proposed method achieved an average accuracy of 89.0 and 94.7% on a working memory task and a motor classification task, respectively, higher than the accuracy of 69.2 and 68.6% obtained by the SVM‐MVPA. A network visualization analysis showed that the DNN automatically detected features from areas of the brain related to each task. Without incurring the burden of handcrafting the features, the proposed deep decoding method can classify brain task states highly accurately, and is a powerful tool for fMRI researchers.

K E Y W O R D S brain decoding, deep learning, functional brain mapping, functional magnetic resonance imaging, Human Connectome Project, transfer learning 1 | INTRODUCTION For years, researchers have been attempting to decode and identify functions of the human brain based on functional brain imaging data (Dehaene et al., 1998;Haynes & Rees, 2006;Jang, Plis, Calhoun, & Lee, 2017;Poldrack, Halchenko, & Hanson, 2009;Rubin et al., 2017). The most popular among these brain-decoding methods is the support vector machine (SVM)-based multi-voxel pattern analysis (MVPA), a supervised technology that incorporates information from multiple variables at the same time (Kim & Oertzen, 2018;Kriegeskorte & Bandettini, 2007;Kriegeskorte, Goebel, & Bandettini, 2006;Norman, Polyn, Detre, & Haxby, 2006). Despite its popularity, the SVM struggles to perform well on high-dimensional raw data, and requires the expert use of design techniques for feature selection/extraction (LeCun, Bengio, & Hinton, 2015;Vieira, Pinaya, & Mechelli, 2017). Thus, we explore in this study an open-ended brain decoder that uses whole-brain neuroimaging data on humans.
In recent years, the deep neural network (DNN), a series of model-free machine learning methods, has performed well in abstracting representations of high-dimensional data (LeCun et al., 2015). The hierarchical structure of a DNN with a nonlinear activation function enables the learning of a more complex output function than those that can be learned using traditional machine learning methods, and one that can be trained end to end. DNNs have already yielded remarkable results in medical image analyses (Cichy & Kaiser, 2019;Shen, Wu, & Suk, 2017;Vieira et al., 2017). Considering these characteristics, a DNN classifier may be suited for classifying brain states directly from a massive whole-brain fMRI time series without requiring feature selection.
Deep learning methods are effective if massive amounts of data are available for training. However, under controlled conditions, most typical neuroimaging studies have collected data from only tens to hundreds of subjects, with the purpose of identifying minor differences between different states (Horikawa & Kamitani, 2017) or groups thereof (Vieira et al., 2017). An applicable brain decoder is supposed to be able to identify these differences even with a limited amount of data. Transfer learning is widely used for training DNNs with limited medical data (Sharif Razavian, Azizpour, Sullivan, & Carlsson, 2014). It takes advantage of similar data within big datasets (Ciompi et al., 2015;Kermany et al., 2018;Wen, Shi, Chen, & Liu, 2018). Recent large fMRI projects, such as the Human Connectome Project (HCP; Van Essen, et al., 2013) and BioBank (Miller et al., 2016), allow us to access massive amounts of fMRI data. It is, therefore, now possible to directly train a DNN decoder by means of big fMRI data and generalize the DNN decoder for common fMRI studies.
In this study, we propose a DNN classifier that effectively decodes and maps an individual's ongoing brain task state by reading 4D fMRI signals related to the task. We illustrate the generalizability of this DNN for typical neuroimaging studies by testing the decoder on the classification of task sub-types.

| HCP datasets
The HCP S1200 minimally preprocessed 3T data release, which contains imaging and behavioral data from a large population of young healthy adults (Van Essen, et al., 2013), was used in this study. We employed data of 1,034 participants of the HCP who had performed seven tasks: emotion, gambling, language, motor, relational, social, and working memory (WM). Further details of the recruitment process, imaging data acquisition, behavior collection, and MRI preprocessing can be found in previous papers Van Essen, et al., 2012;Van Essen, et al., 2013).

| Preparation of fMRI time series for deep learning
We analyzed the HCP volume-based preprocessed fMRI data, which had already been normalized to the Montreal Neurological Institute's (MNI) 152 space. Most of the seven tasks were constituted by control conditions (e.g., 0-back places in the WM task and shape stimuli in the emotion task) and task conditions (e.g., 2-back in the WM task and fear stimuli in the emotion task). In each task, only one condition was selected for the next step. For tasks (emotion, language, gambling, social, and relational tasks) with only two conditions, the condition that showed a greater association with the task had priority over the other. WM and motor tasks contained more than one task condition, and we randomly chose one (2-back body for WM and right hand for motor) from the list (Table 1).
For each task, an input sample was a continuous BOLD series that covered the entire block and 8 s past the block, including the postsignal of the hemodynamic response function (HRF). Furthermore, each BOLD volume was cropped from 91 × 109 × 91 to 75 × 93 × 81 to exclude the area that was not part of the brain. Thus, the input data varied from 27 × 75 × 93 × 81 to 50 × 75 × 93 × 81 (time-× x × y × z, TR = 0.72 s). A total of 34,938 fMRI 4D data items were obtained across all tasks and subjects. Figure 1 shows a flow diagram of our proposed network that consists of five convolutional layers and two fully connected layers. In this experiment, 27 × 75 × 93 × 81 data were generated via the aforementioned preprocessing and data augmentation steps. In the first layer, we used 1 × 1 × 1 convolutional filters, which have been widely used in recent structural designs of convolutional neural networks (CNNs) because these filters increase nonlinearity without changing the receptive fields of the convolutional layer (Hu, Shen, & Sun, 2017;Iandola et al., 2016;Simonyan & Zisserman, 2014). These filters can generate temporal descriptors for each voxel of the volume of the fMRI, and their weights can be easily learnt by DNNs during training. Therefore, after adopting this type of filter, the time dimension of the data was reduced from 27 to 3. Following this, a convolutional layer and four residual blocks were stacked to extract the high-level features. Our residual block is formed by replacing the 2D convolutional layer in the original residual block (He, Zhang, Ren, & Sun, 2016) with a 3D convolutional layer (Maturana & Scherer, 2015). The output channels of the four residual blocks are in multiples of two-32, 64, 64, and 128, respectively. We adopted a stride of two in the second convolutional layer and the last three residual blocks. These layers were designed in such a way that their dimensions could be quickly reduced to balance the consumption of GPU memory. For ease of network visualization analysis, we used a full convolution in the last convolutional layer instead of the pooling operation in CNNs used in common. Two fully connected layers were used after a stack of convolutional layers; the first had 64 channels and the second performed seven-way classification (one for each class). In our models, the rectified linear unit (ReLU) function (Krizhevsky, Sutskever, & Hinton, 2012) and batch normalization (BN) layer (Ioffe & Szegedy, 2015) were applied after each convolutional layer, whereas the softmax function was employed in the last fully connected layer.

| The DNN
Big data played an important role in training the DNNs. Despite the remarkable success of DNNs, their application to a limited amount of data is still a problem. Data augmentation is an efficient way to generate more samples, and has been widely used in applications (Ciompi et al., 2015;Donahue et al., 2014;Wachinger, Reuter, & Klein, 2018). The main purpose of data augmentation is to increase variations in the data where this can prevent overfitting and improve the invariance of the neural network. Contrary to traditional images, the input images in this experiment were already aligned with the standard MNI152 template; therefore, performing data augmentation in the spatial domain was considered redundant.   The implementation of our proposed network was based on the PyTorch framework (https://github.com/pytorch/pytorch). The design was constructed from scratch but initially utilized weights suggested by He, Zhang, Ren, and Sun (2015). To guarantee effectiveness, we used Adam with the standard parameters (β 1 = 0.9 and β 2 = 0.999) (Kingma & Ba, 2014). Due to memory constraints on the graphics board, the batch size was set to 32. The initial learning rate was set to 0.001, and gradually decayed by a factor of 10 each time the validation loss plateaued after 15 epochs. To avoid overfitting, we used the early stopping approach, and stopped training when the validation loss reached a minimum.
Our validation strategy employed a fivefold cross-validation across subjects. Prior to training, the subjects' data were categorized into subsets as follows: training set (70%), validating set (10%), and testing set (20%; Figure 2a). The sample of training/validation/testing was later altered for each of fivefolds. Applying the SVM-MVPA to tens of thousands of data items is time consuming. A comparison between the SVM-MVPA and the proposed method was thus not applied to the entire dataset, but to the Test-Retest task-fMRI group data in Section 2.4.

| Transfer learning
An important advantage of deep learning methods, CNNs in particular, compared with traditional methods, is their reusability, which means that the trained CNN can be directly reused on similar tasks. We used a transfer learning strategy for the trained CNN to validate the general use characteristics of the proposed model. The workflow of transfer training is largely similar to that of the initial training (Figure 2a), except that it starts with a model where the first four layers are trained and the output layer is untrained. We employed the TEST dataset of the TEST-RETEST task-fMRI group from the HCP (N = 43).
We trained the deep model to classify two WM task sub-states-0bkbody and 2bk-body. A subject-wise fivefold cross validation was applied with 60% (100 samples of 25 subjects) used for training, 20% images of each subject were obtained through a GLM with separate regressors embedded in the HCP standard FEAT scripts for each task condition. The resulting beta images were then taken as inputs to the SVM-MVPA. A searchlight analysis was also applied: A sphere with a radius of three voxels "searchlight" moved through each brain using a multi-class classification SVM function (fitcecoc, the Statistics and Machine Learning Toolbox of MATLAB) with a linear kernel. The F1 score (see Section "2.6 Assessments") for each condition was calculated as the resulting map. Fivefold cross-validation was also employed. The classifier was trained on data from four-fifths of the subjects and tested on data from the remaining one-fifth.
To evaluate the applicability of the DNN of fMRI studies using small sample sizes, we trained the deep classifiers on data from the 43 subjects of the HCP TEST scans: N = 1, 2, 4, 8, 17, 25, 34. To avoid variance in accuracy, all tests were applied to the RETEST data of all 43 subjects in the HCP Test-Retest dataset. The deep learning was stopped after 120 epochs. Searchlight and whole-brain SVM-MVPA methods were also used for comparison.

| Performance evaluation
To assess the performance of the model in classifying different tasks, some useful parameters were computed. The F1 score was computed for each task condition as a function of the TP, FP, and FN:  calculated as the mean of the pattern maps of each task divided by their SD (Cohen, 1998). Analysis was conducted in AFNI (Cox, 1996), Freesurfer (Fischl, 2012) (Figure 3a). Figure 3b illustrates

| Visualization of learnt patterns
To identify the voxels contributing most to each classification, we produced pattern maps by using guided back-propagation (Springenberg et al., 2014). Figure 4 shows group statistical maps of the effect size of Cohen's d for the GLM analysis on the task COPE (Figure 4a-g), and the Cohen's d on the DNN pattern maps (Figure 4h-n). As shown in the illustrations, the Cohen's d on the DNN pattern maps was similar to that on the GLM COPEs for emotion, language, motor, social, and WM tasks. For example, with the language condition, a large effect size was aberrant in the bilateral Brodmann 22 area in the GLM COPEs ( Figure 4c) and DNN pattern maps (Figure 4j). In the same fashion, both maps (Figure 4d,k) revealed similar effects in the Brodmann 4 and bilateral Brodmann 18 areas following the right-hand movement condition in the motor task. For further details on annotations, see Table S1.
3.3 | Transfer learning of WM task sub-types on small datasets which were similar to the results of the GLM COPEs (Figure 6a,b).
Moreover, the SVM-MVPA searchlight method reported widespread activity scatters, rather than activity clusters, all over the brain (Figure 6e,f). Refer to Table S2 for further details on the annotations of the maps.
We then validated the amount of data needed for learning. All three methods reported higher than chance-level accuracy across all N Subj . N Subj = 8 was enough for the DNN (80.3%) to outperform the ordinary SVM-MVPA whole-brain (41.7%) and SVM-MVPA ROI (56.3%) methods in terms of accuracy ( Figure 7d).

Finally, we visualized the DNN pattern maps and found that
Cohen's d reached the highest values in the corresponding motor topological areas, which was similar to the results of the GLM COPEs and the SVM-MVPA searchlight method (Figure 8). Refer to Table S2 for further details on the annotations of the maps.

| Summary
In this study, we proposed a general deep learning framework for decoding and mapping ongoing brain task states from whole-brain fMRI signals of humans. After training and testing it using data from

| Deep learning as a research tool
Deep learning is capable of automatic data-driven feature learning and has deeper models than earlier methods. Analogous to the brain's sensory network, DNNs perform complex computations through deep stacks of simple intra-layer neural circuits. Thus, researchers have widely used DNN models to understand the human brain network, especially sensory brain networks (Eickenberg, Gramfort, Varoquaux, & Thirion, 2017;Guclu & van Gerven, 2015;Horikawa & Kamitani, 2017;Rajalingham et al., 2018;Yamins & DiCarlo, 2016). At the same time, DNNs are capable of discovering complex structures within high-dimensional input data, and can transform these structures into abstract levels (LeCun et al., 2015). These important features allow researchers to efficiently model complex systems without the burden of model/prior knowledge selection, especially in cases where too many features exist, as when analyzing medical images . Thus, DNNs are widely used by researchers for medical image analysis, such as brain image segmentation (Havaei et al., 2017;Wachinger et al., 2018;Zhang et al., 2015), neurology and psychiatric diagnostics (Hosseini-Asl, Keynton, & El-Baz, 2016;Meszlenyi, Buza, & Vidnyanszky, 2017;Plis et al., 2014;Vieira et al., 2017), brain state decoding (Jang et al., 2017), and brain computer interfaces (Schirrmeister et al., 2017). (c) Accuracy of fivefold cross-validation classification on the motor task on a small dataset. The accuracy of the DNN (94.7 ± 1.7%) was significantly higher than that of SVM-MVPA whole-brain (t[8] = 3.59, p = .0071; mean ± SD = 81.6 ± 7.1%) and SVM-MVPA ROI (t[8] = 8.77, p = .000022; mean ± SD = 68.6 ± 5.7%) methods. (d) The performance of the three methods across different numbers of subjects for training (N Subj ). All conditions reported higher than chance-level accuracy. N Subj = 8 was enough for the DNN to outperform the ordinary SVM-MVPA methods spatial information in learning, which may significantly affect their performance and interpretability in medical image analysis (Voulodimos, Doulamis, Doulamis, & Protopapadakis, 2018). The RNN with LSTM, a deep learning method for sequence modeling, ignores spatial information within the input data (Hochreiter & Schmidhuber, 1997 (i-l) The F1 score of the SVM-MVPA searchlight method. Collectively, the three methods identified similar brain activity maps significant departure from these studies, however, by directly targeting fMRI volume through the 3D CNN. The proposed 3D CNN, which makes use of the spatial structure of the input data, is efficient in capturing spatial relationships of the brain activity. As end-to-end learning methods, CNNs have the unique capability of learning features automatically and avoids the design of a feature extractor. On the contrary, CNNs heavily rely on manually labeled training data, but this is not a problem for neuroimaging research because almost all neuroimaging data are carefully labeled with diagnostics, task states, and questionnaires. Moreover, because the CNN requires scant handcrafting of features by experts, it is easily usable by data scientists on neuroimaging data.
We used an NVIDIA GTX 1080Ti GPU in our experiments. The initial training took a long time (72 hr for 30 epochs) while transfer learning took much less time (9 hr for 120 epochs on the two-class classification task, and 21 hr for 120 epochs in the four-class classification task). The proposed CNN was composed of three convolutional layers and two fully connected layers with 3,981,852 parameters.
Given these layers and their hyperparameters, we could make count-

| Visualization of learnt patterns
The proposed method also offers researchers the opportunity to investigate decisions of the neural network. A challenge of applying deep models to neuroimaging research is the black-box characteristic of this approach: No one knows exactly what the deep network is doing. In recent years, a method for tracing consecutive layers of weights back to the original image inputs has been proposed, and has achieved good performance in natural image recognition (Springenberg et al., 2014).
Researchers have employed various methods for the analysis of the processes of DNNs (Bach et al., 2015;Yamins & DiCarlo, 2016). Guclu and van Gerven (2015)  classification. There is a criticism where good decoding performance is not a guarantee that patterns of brain activity are learned (Ritchie, Kaplan, & Klein, 2019), for a decoder may learn from nuisance or latent variables (Riley, 2019)-for example, the different visual responses to different stimulus images or patterns of response key-pressing across the seven tasks. The guided back-propagation allows scientists to intuitively locate and investigate features the DNN detected in every entered fMRI data item. In this work, the similarity between the pattern maps and the GLM maps (Figures 4, 6, and 8) suggest that the proposed DNN decoded states from task-related brain activity patterns, not from nuisance variables. Furthermore, correlated with the β maps of the GLM, the pattern maps showed potential for localizing state-related areas of the brain. However, the statistical property of guided backpropagation remains unclear, and we should be cautious until further investigations on its reliability and statistical properties.

| Transfer learning helps model construction with small samples
Transfer learning is a machine learning method that learns from networks trained on a related but different task from the given one. By taking advantage of transferred knowledge, it eliminates the need for big training data (Rawat & Wang, 2017). Hosseini-Asl et al. (2018) pre-trained a 3D convolutional autoencoder to capture anatomical shape variations in brain MRI scans and fine-tuned it for AD classification on images from 210 subjects. Gao et al. (2019) pre-trained a 2D-CNN for classification on ImageNet, a database containing >14 million natural images, and fine-tuned it to decode 2D fMRI slices. The proposed method transfer-learns in a more direct way-transferring knowledge learnt from a big fMRI dataset to limited fMRI datasets.
We believe that the proposed DNN can transfer-learn a related but different decoding task using fMRI data from as few as four subjects ( Figure 5d). Although our deep learning framework was trained and validated using the HCP S1200 dataset, the consistent internal properties of human hemodynamic responses make fMRI data reasonably consistent across scanners and sites. Nowadays big datasets, such as BioBank, HCP, and OpenfMRI, provide comprehensive neuroimaging scans across a wide range of ages and diseases, and provide the opportunity for pretraining on big data and transfer learning on small fMRI datasets.

| Transfer learning to the WM task
We evaluated the generalizability of our deep learning framework in transfer learning to WM data of 43 subjects. WM refers to a brain function for the temporary storage and manipulation of information for cognitive processing (Baddeley, 1992). We chose the WM because researches have shown that it is not processed in a single brain site, but stored and processed in widely distributed brain regions (Christophel, 2015) and parietal (Xu & Jeong, 2016) (2001)). Its performance in classifying two tasks provided more evidence that the model learnt from taskrelated brain activity, rather than nuisance variables, because the stimuli were consistent, with merely the task altered, between 0-back and 2-back.

| Transfer learning to the motor task
We evaluated the generalizability of our deep learning framework in transferring learning to multi-class motor data of 43 subjects. Motorrelated information was encoded in the primary motor cortex, premotor cortex, and supplementary motor area around the central sulcus. The topological nature of the motor area makes it the first cortex to be decoded in the human brain (Dehaene et al., 1998). In our experiment, the SVM-MVPA was good at single-label classification (high F1 scores for each task in Figure 8) but delivered poor performance at multi-class classification (low accuracy in Figure 7d). The proposed method showed its potential in multi-class classification over the SVM-MVPA method. Cognitive neuroscience has attended to particular brain functions, but researchers are now calling for models that generalize beyond specific tasks (Varoquaux & Poldrack, 2019;Yarkoni & Westfall, 2017). Brain systems are often engaged in a variety of brain functions (Varoquaux et al., 2018), and predictive investigations of general tasks can ultimately lead to a greater understanding of the human brain. The proposed method provides researchers with the choice of decoding and interpreting brain functions in an integrative way.

| Future work
Although we illustrated the deep model's ability to read the fMRI time series, researchers can modify the input layer and take a volume of brain features as input to the proposed deep model, such as the amplitude of low-frequency fluctuation (ALFF), fractional ALFF (fALFF), and regional homogeneity (ReHo) of resting-state fMRI as well as the fractional anisotropy (FA) and mean diffusivity (MD) of diffusion tensor imaging (DTI). The model is also applicable to multi-modal inputs to different channels, which are important for research in psychiatry and neurology because most of the open datasets used, such as ADNI (Alzheimer's Disease Neuroimaging Initiative), ABIDE (Autism Brain Imaging Data Exchange), BioBank, and SchizConnect. The proposed method can provide a basis for a brain-based information retrieval systems by classifying brain activity into different categories: brainbased disorder or psychiatric classification. Varieties of deep learning methods have shown their power in searching for biomarkers of psychiatric and neurologic diseases (Vieira et al., 2017), and the proposed method provides one more choice.
Activity classification can also benefit real-time fMRI neurofeedback (rt-fMRI-NF), a technology providing subjects with feedback stimuli from ongoing brain activity collected by an MRI scanner (Cox, Jesmanowicz, & Hyde, 1995;Sulzer et al., 2013). Recently, a data-driven and personalized MVPA rt-fMRI-NF method (Shibata, Watanabe, Sasaki, & Kawato, 2011), decoded neurofeedback (DecNef), was proposed, and has shown outstanding performance in both basic and clinical research (Thibault, MacPherson, Lifshitz, Roth, & Raz, 2018;Watanabe, Sasaki, Shibata, & Kawato, 2017). The proposed deep model has the potential to decode multiple brain states from whole-brain fMRI time series and to output these to feedback processing in real time. Moreover, the model can be fine-tuned to individual brain activity through transfer learning to build up a personalized rt-fMRI-NF.

| Conclusion
We proposed a method to classify and map an individual's ongoing brain function directly from a 4D fMRI time series. Our approach allows for the decoding of a subject's task state from a short fMRI scan without the burden of feature selection. This flexible and efficient brain-decoding method can be applied to both large-scale massive data and fine, small-scale data in neuroscience. Moreover, its characteristics of facility, accuracy, and generalizability allow the deep framework to be easily applied to a new population as well as a wide range of neuroimaging research, including internal mental state classification, psychiatric disease diagnosis, and real-time fMRI neurofeedback.

ACKNOWLEDGMENTS
This study was supported by the National Natural Science Foundation

CONFLICT OF INTEREST
The authors declare that the research reported here was conducted in the absence of any commercial or financial relationships that can be construed as potential conflicts of interest.
J.G. and B.Q. conceived of the study and contributed to writing the manuscript. All authors discussed the results and reviewed the manuscript.

DATA AVAILABILITY STATEMENT
All scripts described in this paper are available at https://github.com/ ustc-bmec/Whole-Brain-Conv.