ConvNets for Electroencephalographic Decoding of Attempted Arm and Hand Movements of People with Spinal Cord Injury

Brain–computer interfaces (BCIs) facilitate communication between the brain and external devices, providing an alternative solution for individuals with upper limb disabilities. The decoding of brain movement commands in BCIs relies on signal feature extraction and classification. Herein, the BNCI Horizon 2020 dataset is employed, which consists of electroencephalographic signals from ten participants with subacute and chronic cervical spinal cord injuries. These participants perform or attempt five distinct types of arm and hand movements. To extract signal features, a novel technique is introduced that estimates movement‐related cortical potentials and incorporates them into the processing pipeline. Moreover, a time‐frequency domain representation of the dataset is used as input for the classifier. Given the promising outcomes demonstrated by deep learning models in BCI classification, a pretrained ConvNet AlexNet is adopted to decode the motor tasks. The proposed method exhibits a remarkable average accuracy of 76.0% across all five categories, representing a significant advancement over existing state‐of‐the‐art techniques. Additionally, an in‐depth analysis of the convolutional layers in the model is conducted to gain comprehensive insights into the classification process. By examining the ConvNet filters and activations, the method contributes to a deeper understanding of the electrophysiology that underlies attempted movement.


Introduction
Brain-computer interfaces (BCIs) establish a communication pathway between the brain and an external device.[3][4] One commonly used type of BCI involves the utilization of electroencephalographic (EEG) signals.
Decoding EEG signals can provide information about motor tasks that an individual is performing or attempting to perform.EEG-based BCIs leverage various signal features for the decoding process.An advantage of this type of BCI is the affordability and low-risk nature of EEG acquisition systems.7][8] Certain BCIs designed for upper limb disabilities rely on movement-related cortical potentials (MRCPs).MRCPs are motor cortex potentials that occur during the execution or imagery of movement, and they are particularly prominent in EEG signals recorded from central and middle regions of the brain. [9]In their study, Xu et al. [10] aimed to classify reachand-grasp movements, including palmar, pinch, push, twist, plug, and a resting state class.They utilized MRCPs and examined the differences in cortical EEG features and network structures across the different classes.MRCPs were projected onto a source space, and the average amplitudes in specific regions of interest were used as classification features.Functional connectivity was also assessed using the phase locking value.The results demonstrated a comparable grand average peak performance of 49.35% when employing source features with a reduced number of EEG channels.Wang et al. [11] conducted research to differentiate between EEG signals corresponding to two distinct classes of hand movement.Decoding was based on nonlinear dynamic parameters of MRCPs, and classification was performed using a linear discriminant analysis (LDA) model.The findings revealed significant differences in MRCPs between the various hand movement classes, achieving an average binary decoding accuracy of 89.5%.Schwarz et al. [12] employed EEG recordings from participants executing self-initiated reach-and-grasp actions toward a glass (palmar grasp) and a spoon (lateral grasp).
Their results indicated that a multiclass-based decoding approach, incorporating a rest state and MRCPs as inputs to a shrinkage LDA (sLDA) classifier model, yielded a maximum average peak accuracy of 62.3% using a water-based electrode acquisition system, 61.3% using gel-based electrode system, and 56.4% for the dry-electrodes system.
EEG and MRCPs decoding needs appropriate feature extraction and a classification model, which should be trained through an optimization process.Deep learning models, such as convolutional neural networks (CNN or ConvNets), have shown remarkable performance in these tasks over the last decade. [1]However, training ConvNets from scratch requires a substantial amount of parameters to be estimated, and it also needs high-performance computers and long training times. [13]To address this limitation, transfer learning has emerged as a viable strategy. [14]Transfer learning leverages a pretrained network as an efficient approach for small datasets. [15,16]The learned features from the pretrained network can be transferred to the training process with the new dataset, requiring only fine-tuning of a smaller set of model parameters.Fine-tuning is less computationally expensive and demands fewer samples compared to a complete training process.Moreover, fine-tuning the model may yield favorable results in terms of generalization.
In a related study, Kumar et al. [17] presented a classification of winking signals based on EEG using transfer learning.They employed various architectures for feature extraction in combination with a fine-tuned random forest (RF) classifier.The results demonstrated that the Inception ResNetV2 transfer learning model, in conjunction with the RF classifier, achieved a training and validation accuracy of 100%.In another study, Sinam et al. [18] proposed a P300 detection-based BCI model utilizing information from a single channel.They enhanced the classification performance by utilizing scalogram features derived from EEG signals.The researchers employed transfer learning with a pretrained AlexNet as the classifier.Their findings revealed that the proposed BCI achieved high average information transfer rates of 13.23-26.48bits min À1 for individuals with disabilities.Bressan et al. [19] utilized two datasets involving hand movements, such as touching, grasping, palmar grasping, and lateral grasping.The authors used MRCPs as inputs to train a ConvNet model and compared its classification performance with those of an sLDA and an RF model.The results indicated that the ConvNet exhibited satisfactory performance on both datasets, achieving accuracies of 70% and 64%.Moreover, the ConvNet demonstrated faster preprocessing compared to the LDA and RF models.Khademi et al. [16] proposed hybrid models incorporating pretrained ConvNets and long short-term memory (LSTM) neural networks for motor imagery classification.They used pretrained ConvNets such as ResNet-50 and Inception-v3 to leverage more complex features for classification tasks.Transfer learning and data augmentation techniques were employed to address the limitations of their small dataset, known as "BCI Competition IV dataset 2a". [20]Furthermore, the researchers used EEG time-frequency representation obtained from the continuous wavelet transform (CWT) as input images for the ConvNet.The performance results demonstrated a maximum average mean accuracy of 92%.
Another advantage of employing ConvNets is the ability to visualize the learned features of the classification model, providing insights into the decision-making process of the classifier. [14]However, to the best of our knowledge, limited efforts have been devoted to understanding this information in the context of BCI applications.
In this article, we adopt the pretrained ConvNet AlexNet, [21] in conjunction with transfer learning, to classify EEG timefrequency information (scalograms) from five different types of hand movements in the BNCI Horizon 2020 database. [22]e propose a novel strategy that involves estimating and removing MRCPs from the EEG signals using an absolute difference; therefore, our approach does not use MRCPs for the classification of different movements but improves scalograms characteristics useful to feed the classifier.Furthermore, we include the visualization of the features learned by the model to gain insights into the relevant characteristics for the classification task.
The remainder of this article is organized as follows.Section 2 provides a detailed description of the dataset, preprocessing and processing stages, the classification model, and the associated training and visualization techniques.Section 3 presents the results of the proposed method and offers a discussion.Finally, in Section 4, we conclude the article and outline future directions for this study.

Dataset Description
BNCI Horizon 2020 database: The present study utilizes the dataset accession number 001-2019 from Ofner et al., [22] which involved individuals with spinal cord injury.EEG data from a cohort of 10 participants with subacute and chronic cervical spinal cord injury at AUVA Rehabilitation Clinic in Tobelbad, Austria, were employed.Participants were instructed to execute or attempt various hand movements based on their residual motor abilities.The study group consisted of nine males and one female, ranging in age from 20 to 69 years.
EEG signals were recorded using four 16-channel g.USBamps biosignal amplifiers and a g.GAMMAsys g.LADYbird active electrode system (g.tecmedical engineering GmbH, Austria) at a sampling frequency of 256 Hz.Preprocessing of the acquired signals involved applying an eighth-order Chebyshev band-pass filter ranging from 0.01 to 100 Hz.Power line interference was mitigated through the implementation of a notch filter at 50 Hz.A total of 61 electrodes, covering frontal, central, parietal, and temporal areas, were utilized for signal acquisition.Additionally, electrooculogram (EOG) signals were recorded using three electrodes placed above the nasion and below the outer canthi of the eyes.The reference electrode was positioned on the left earlobe, while the ground electrode was located on AFF2h.
During the experiment, each participant was seated in front of a computer screen, where they received specific instructions.At the beginning of each trial, a fixation cross and an auditory beep were presented.Participants were instructed to maintain their gaze fixed on the cross throughout the entire 5 s trial period to minimize eye movements.Two seconds after the trial was initiated, the class cue was displayed and remained on the screen for the subsequent 3 s until the trial conclusion.The class cue could be one of five different classes of hand movement: pronation, supination, palmar grasp, lateral grasp, or hand open.Participants were required to exclusively execute or attempt the corresponding movement immediately upon the display of the class cue.A break period of 1-3 s was provided between trials.The dataset comprised nine runs, each consisting of 40 trials, resulting in a total of 72 trials per class for each of the ten participants (refer to Figure 1 for a visual representation of the experimental setup).

Noise Removal
The common average reference (CAR) technique was employed to enhance the signal-to-noise ratio and eliminate noise in the 61 EEG channels.CAR calculates the average signal across all electrodes and subtracts this value from each individual electrode, as shown in Equation ( 1) where y t ð Þ represents the denoised signal obtained after applying CAR, ð Þ corresponds to the various electrode signals, and N denotes the total number of channels or electrodes.Furthermore, the length of the EEG signals was limited to 2 s following the execution or attempted execution of the respective movement by the subject.

MRCP
MRCP has been utilized as a control signal in BCIs.Previous studies have demonstrated the meaningful information provided by MRCPs regarding various movements of the same limb, such as hand opening and closing, as well as different types of grasps.For individuals with spinal cord injury (SCI), EEG signals derived from movement attempts or executions can be harnessed by BCIs to control output devices. [23]Ofner et al. [22] conducted a study where they observed that the averaged signal of the central electrode Cz exhibited the characteristic MRCP pattern during movement attempts.Moreover, they identified discriminative information, such as positive and negative peaks, within MRCPs that could be indicative of the movement class, a finding also supported by the research conducted by Zhang et al. [24] This observation suggests that MRCP patterns contain valuable information that can be used to preprocess signals before feeding them into the classifier model, ultimately enhancing its performance.Therefore, in our proposed methodology, we partitioned the entire dataset into a training set, consisting of 80% of the data, and a test set, comprising the remaining 20% of the data.
The training set was used to estimate the average signals of the central electrode Cz across multiple trials.This approach allowed us to extract MRCP information for each of the five attempted movement classes in each subject (Figure 2).

Independent Component Analysis
Independent component analysis (ICA) employs a generative model to estimate the underlying signal generation process.In the context of EEG, the measured signals correspond to EEG channels.The ICA model represents these EEG electrode signals as a linear combination of independent sources, each representing distinct and meaningful neural activities. [25]The ICA model can be described by Equation (2) where s i denotes the source signals, A represents the mixing matrix comprised of constant elements, and x i represents the electrode signals.
In the present study, the number of EEG channels coincided with the number of independent sources.However, certain scenarios may arise where the number of channels exceeds the number of sources.In such instances, it is advisable to employ dimensionality reduction techniques, such as principal component analysis (PCA), as a preliminary step prior to implementing ICA.
In this study, we extracted ten independent components from the training and test datasets comprising the 61 preprocessed EEG electrode signals, which had previously undergone CAR filtering.The determination of the appropriate number of independent components was based on experimental estimation.Subsequently, for each class and each subject, the absolute difference between the estimated ten independent components and the trial-averaged signal of electrode Cz was computed.This calculation was performed to enhance the information unrelated to MRCPs present within the independent components (see Figure 3).

Continuous Wavelet Transform
The continuous wavelet transform (CWT) offers a representation of signals in the time-frequency domain known as a scalogram.CWT has gained significant popularity in biomedical signal processing due to its advantages over other time-frequency domain representations, such as the short-time Fourier transform (STFT).Unlike the STFT, which employs periodic time-unlimited functions, CWT utilizes time-limited basis functions to analyze and decompose the time-limited events within a signal.This characteristic is particularly beneficial for capturing the nonstationary nature of biomedical signals, including EEG signals. [26]n this study, we employed CWT to generate scalograms for the ten preprocessed independent components that were estimated in the previous step.We selected the Morlet mother wavelet for this purpose, as it has been extensively used in biomedical signal processing. [17,27,28]We set the frequency limits to range between 0.5 and 100 Hz.The resulting scalograms were concatenated along their horizontal axis and saved as normalized grayscale images.Subsequently, these grayscale images were modified to have a squared size and a red-green-blue (RGB) color scheme, which were the required image characteristics for the ConvNet used in our analysis.This process enabled us to create separate training and test sets comprising scalogram images, as depicted in Figure 4.

Feature Extraction and Classifier: Transfer Learning
Transfer learning techniques along with pretrained ConvNets reduce the need of using large datasets.Moreover, it decreases training time and computational resources because the pretrained ConvNet has already learned general features and only a model fine-tuning is necessary to use it for a different classification task.In addition, this technique helps to improve generalization, contributing to a more robust classifier. [15]In our methodology, we employed transfer learning using AlexNet, which is a pretrained ConvNet initially trained on a vast training set containing over a million images classified into 1000 different classes.AlexNet comprises five convolutional layers and three fully connected (dense) layers that were specifically designed for classifying the original 1000 categories. [21]As a result of extensive training, the convolutional layers of AlexNet have acquired a wealth of image feature representations.Conversely, the dense layers have learned image representations that are tailored to the original model's 1000 classes. [14]In our study, we performed fine-tuning of AlexNet using the training set of scalogram images.This process involved modifying the dense layers to adapt them to the new classification task that includes the five categories of arm and hand movements.For a visual depiction of the AlexNet network architecture, refer to Figure 5.

ConvNet Learning Visualization
In contrast to other deep learning techniques, ConvNets offer the distinct advantage of enabling visualization and interpretation of the learned image representations.In this study, we employed three techniques for this purpose: Visualization of ConvNet filters: This technique allows us to observe the visual pattern to which each layer filter in a ConvNet is maximally responsive.
Visualization of layer activations: By visualizing the layer activation or output using a specific input image, we gain insights into the decomposition of the input across different feature maps, which result from the application of the layer filters. [14]radient-weighted class activation mapping (Grad-CAM): Grad-CAM leverages the gradient of the classification score with respect to the learned features of the ConvNet to generate a heatmap highlighting the regions of an input image that contributed to a specific classification outcome. [29]hese techniques collectively enhance our understanding of the learned representations within the ConvNet and provide valuable insights into the model's decision-making process.

MRCP
The characteristic peaks of MRCP can be observed in the trialaveraged signal recorded from the central electrode Cz for each of the five classes.In the period of approximately 0.5 ms  following the cue class, the trial-averaged signal exhibited a positive peak, succeeded by a negative peak around 1 s.This distinctive morphology was previously described by Ofner et al. [22] Figure 6 illustrates the trial-averaged signal recorded from the Cz electrode and across subjects for each of the five-movement classes.

Performance Evaluation
To evaluate the performance of the classification model, we utilized the scalogram test set.The classification accuracy was computed by determining the ratio of correctly predicted observations to the total number of observations.Moreover, we employed a confusion matrix to obtain two additional performance metrics: precision and recall (sensitivity).Precision quantifies the ratio of true positive observations to the total number of predicted positive observations, while recall or sensitivity measures the ratio of true positive observations to the total positive in the data. [30]

Movement Classification
The proposed method has demonstrated a mean accuracy of 76%, marking a significant enhancement in movement classification compared to the findings by Ofner et al., [22] who achieved a maximum average accuracy of 45.5%.Moreover, the proposed method has shown superior classification performance across all five classes, with per-class accuracy exceeding 70%, as illustrated in the confusion matrix shown in Figure 7.For a comprehensive analysis of the performance comparison between the two methods for each class, refer to Table 1.
Table 2 presents an overview of results obtained by other researchers who employed various datasets related to movement attempts.Our method achieved accuracy surpassing that of all other works, with the exception of the studies conducted by Wang et al. [31] and Aly et al. [32] It is noteworthy that Aly et al. utilized not only the EEG signals for decoding wrist and hand movements but also incorporated electromyography (EMG) signals, which directly reflect muscle activity in controlling these movements.

Visualization of ConvNet Filters
We successfully visualized the filter patterns of the convolutional layers in our model, providing insights into how ConvNet layers decompose their inputs using learned filters.Moreover, as we delved deeper into the model, the complexity of the filter patterns increased.For instance, filters in the initial layer encoded edges in various directions, while deeper layers learned filters that captured textures formed by combinations of edges.Ultimately, the deepest layer aimed to encode intricate textures that could  Table 1.Performance comparison between state-of-the-art method and our method proposed for the five arm and hand movement classes.

Visualization of Layer Activations
Using a specific input of a scalogram image, the ConvNet can visually represent its decomposition through feature maps across various convolutional and pooling layers.In Figure 9, we illustrate feature maps obtained from convolutional layers 1, 3, and 5.The chosen input corresponds to scalogram number 72, derived from subject 10, specifically representing the supination class.Also, Figure 10 showcases the maximum activation within the first, third, and fifth convolutional layers, all corresponding to the same scalogram.

Grad-CAM
We employed Grad-CAM visualization technique to generate heat maps depicting class activation on targeted input scalograms.The utilization of Grad-CAM provided valuable insights into the pertinent regions of scalograms that contributed to the model's decision-making process.Figure 11 presents the original scalograms alongside the corresponding Grad-CAM visualizations for pronation and palmar grasp in a single subject.In addition, Figure 12 showcases the original scalograms and Grad-CAM visualizations for supination in two different participants.

Discussion
The extraction of MRCPs was conducted to investigate their discriminative patterns, as highlighted by Ofner et al. [22] However, accurately estimating these patterns needs a substantial dataset.On the other hand, Zhang et al. [24] performed a statistical analysis, revealing significant differences in negative peak amplitude across certain motions, while no such distinction was observed for positive peak amplitude.Consequently, an additional processing step is imperative to enhance the performance of the classification model.This article introduces a novel approach wherein the MRCP information is disregarded by using absolute Aly et al. [32] EEG and EMG Five wrist and hand movements ConvNet-LSTM 95.2% Wang et al. [31] EEG Seven upper limbs motor imagery LSTM 76.2% Bressan et al. [19] EEG Two datasets and four hand movements ConvNet 70% and 64% Schwarz et al. [12] EEG Three datasets and two hand movements sLDA 62.3%, 61.3%, and 56.4% Xu et al. [10] EEG Six hand movements sLDA 49.4% Our method EEG Five hand movements ConvNet 76.0%  difference analysis, focusing solely on the remaining information.The experimental findings indicate a notable increase of over 25% in average accuracy compared to the methodology proposed by Ofner et al. [22] These results demonstrate that the most relevant features for hand movement commands in EEG signals may not be inherent in MRCPs.
Similar investigations into decoding arm and hand movement or motor imagery have been conducted by Bressan et al., [19] Wang et al., [31] Xu et al., [10] Aly et al., [32] and Schwarz et al. [12] In studies relying solely on EEG signal, the models exhibited comparable or lower performance than the approach proposed in this article, but it is worth noticing that both models were trained and tested with different datasets.However, when a combination of EEG and EMG was employed, accuracy improved significantly, reaching up to 95.2%.Wang et al. [31] achieved an accuracy of 96.6% for a binary classification task involving motor imagery versus rest.Nevertheless, in a multiclass scenario for upper limb motor imagery detection encompassing MRCPs their average performance reached an acceptable level of 76.2%.These findings imply two key observations: first, successful binary classification can be attained using EEG features; however, when decoding specific hand commands is desired, these features may not provide the necessary information.Second, in contrast to the research works presented in Table 1 and 2, it was found that MRCPs might have a negative influence on the multiclass classification process, as indicated by our results.
In the ConvNet architecture, the employed filters may exhibit associations with specific frequency components, particularly in the initial layers (refer to Figure 8a).Nevertheless, a distinct shift along the x-axis becomes apparent, with a visible absence of horizontal patterns and predominance of diagonal and vertical patterns.As we delve deeper into the network (Figure 8b,c), more intricate patterns emerge, characterized by the relevance of temporal locations and narrower frequency bands.A direct comparison between these patterns and MRCPs proves challenging due to the inherent ICA process and the concatenation of scalograms to construct the input images for the ConvNet.Subsequent analysis holds promise for shedding light on the underlying physiological mechanisms; however, to the best of our   knowledge, there exists an insufficiency of related studies for meaningful comparison.
The maximum activation within the first convolutional layer revealed the presence of vertical edges, as depicted in Figure 10a,b.Nonetheless, a notable shift in activation patterns occurred in the third layer, where the maximum activation became dependent on high frequencies of the independent components, as illustrated in Figure 10c.In contrast, for the fifth layer, maximum activation was associated with low frequencies, as demonstrated in Figure 10d.Importantly, these distinctive characteristics persisted across all ten independent components.
The analysis of Grad-CAM results revealed significant variations in the time-frequency of the scalograms across different types of movements and participants, as depicted in Figure 11.Specific patterns emerged for subject 5: during pronation, the initial low-frequency components exhibited the highest significance in the classification process, whereas for palmar grasp, medium-and high-frequency components took precedence.A similar phenomenon is highlighted in Figure 12, showcasing two distinct subjects within the same class.In the case of supination, Grad-CAM identified different regions of interest in the classification problem, with varying scalogram regions for subjects 5 and 10.Our approach differs from the study by Ieracitano et al., [33] where the primary goal was to investigate the cortical areas involved in the classification process of movement intention.In our study, during the Grad-CAM analysis, we focused on the time-frequency features, while Ieracitano et al. primarily focused on the specific location of the relevant activity.

Conclusions and Future Work
The combination of time-frequency representation and pretrained ConvNets, such as AlexNet, offers a promising approach for classifying brain signals associated with different hand movements.Scalograms, which provide an image representation of EEG signals, effectively assist ConvNets in learning significant spatial patterns that are valuable for the classification task.Furthermore, the incorporation of a priori knowledge regarding MRCPs during signal preprocessing facilitates the extraction of relevant information, leading to notable improvements in the classification performance.
In contrast, the use of Grad-CAM allows us to observe that the independent components scalograms and their corresponding regions of interest varied among subjects and different hand movement classes.
For future research endeavors, we intend to explore alternative pretrained ConvNet architectures, as well as other models such as LSTM networks and hybrid models.This comparative analysis will enable us to assess the classification performance, robustness, and universality of our proposed method.
Moreover, we plan to evaluate our methodology using a dataset comprising hand movements performed by healthy subjects.By conducting this comparison with the dataset of individuals with SCI, we can gain valuable insights into the transferability and adaptability of our approach across different populations.
Finally, we aim to leverage the information obtained from Grad-CAM and activations to conduct a more detailed physiological analysis.Additionally, integrating techniques, similar to those proposed by Ieracitano et al., [33] to probe the cortical regions housing relevant neural activity would offer a more holistic view, encompassing not only the optimal time-frequency features but also the ideal spatial locations for the classification task.This will provide deeper insights into the underlying mechanisms of the observed neural responses during hand movements, offering a comprehensive understanding from a physiological perspective.

Figure 4 .
Figure 4. Concatenated scalogram image from the ten preprocessed independent components.

Figure 5 .
Figure 5. AlexNet network for classification of five types of arm and hand movements.

Figure 6 .
Figure 6.Grand averages of electrical potentials in Cz electrode for each class.

Figure 7 .
Figure 7. Confusion matrix using the method proposed for classifications of five types of arm and hand movements.

Figure 8 .
Figure 8. a) First 16 filter patterns for first convolutional layer, b) filter patterns for third layer, and c) filter patterns for fifth layer.

Figure 9 .
Figure 9. a) Input image: scalogram 76 from subject 10 for the supination class, b) first 16 feature maps for convolutional layer 1, and c) third and d) fifth layer.

Figure 10 .
Figure 10.a) Input image: scalogram 76 from subject 10 for the supination class; maximum activation for the input image in b) first, c) third, and d) fifth convolutional layer.

Table 2 .
Characteristics and average accuracy from other state-of-the-art methods that used different datasets.