Functional neural interactions during adaptive reward learning: An functional magnetic resonance imaging study

A key feature of learning in humans is flexibility in adjusting the weight of new information to update predictions. This flexibility can be computationally captured by changing the learning rate in a reinforcement‐learning model. Key components of reinforcement learning—such as prediction error (δ), learning rate (α), and reward feedback (r)—have been mapped to various brain areas. However, questions regarding the functional integration patterns in the human brain under the modulation of learning factors, and their interactions, remain unanswered. To investigate these phenomena, we first applied a reinforcement‐learning model with an adaptive learning rate and functional magnetic resonance imaging to simulate the individual's reward‐learning behavior. Psychophysiological interaction (PPI) analysis was then used to examine the functional interactions of the whole brain under the experimental condition of reward (r), the integration between reward and learning rate (α × r), and the integration between the prediction error and learning rate (α × δ) in a reward‐learning task. The behavior statistical analyses indicated that the model estimates of α and δ captured the participants' learning behavioral patterns of getting high reward, by changing α and δ for different difficulties and after getting different reward feedback. The PPI analysis results showed that motor‐related regions (including the supplement motor area, precentral gyrus, and thalamus) contributed to cognitive control processing regions (including the middle temporal gyrus, anterior and middle cingulate gyrus, and inferior frontal gyrus) by α × r. Finally, α × δ modulated the interaction between subregions of the striatum.


| INTRODUCTION
Humans can learn in dynamic environments and determine their future behaviors accordingly. Adaptive learning behavior is characterized by a process of flexible belief adjustment or updating. Researchers have used model-based reinforcement learning to investigate this flexibility in adaptive learning during a reward-learning task. 1,2 Human decision making can be framed as a computational model of reinforcement learning. In the model, the adaptive reward-learning process involves collecting information from outcomes (reward feedback), calculating the difference between the prediction of the outcomes and the actual environmental outcomes (prediction error, δ), and deciding how to update predictions about the future according to errors in prediction (learning strategy or learning rate). Computational models of reinforcement learning infer the free model parameters by fitting the model to the behavior. In one trial of a reward-learning task, one can directly observe the state of reward feedback, actual outcomes, and the participant's prediction. Prediction in the next trial is then updated using the prediction error in the current trial, weighted by a learning rate, which controls the influence of the prediction error on the updating of prediction. The prediction error and prediction together provide trialwise estimates of the participant's learning rate.
At the behavioral level, recent research has demonstrated that the reinforcement-learning model of reward learning captures the patterns of behavioral performance during a reward-learning task. [3][4][5] In reward-learning tasks, the reward feedback affects the subsequent strategy of participants performing a gambling task. 6,7 In our previous study, we used a reinforcement-learning model with a flexible learning rate to account for human reward-learning behavior. The results showed that the learning rate estimates were higher after high reward than low reward. 8 In addition, humans seem to use an asymmetric learning rate to determine positive and negative prediction errors. 7,9,10 In the reinforcement-learning model, 11 the updating of prediction at time i + 1 (denoted as Δp i + 1 ) is the prediction error (δ i ) multiplied by the learning rate (α i ) at time i; thus, Δp i + 1 = α i × δ i .
At the neural level, existing model-based reinforcementlearning analyses, combined with functional magnetic resonance imaging (fMRI), have located brain structures that encode these model estimates in humans. For example, change in reward feedback activates the dorsolateral prefrontal cortex 12 ; learning rate is correlated with fMRI signals in the anterior cingulate cortex (ACC), 13 anterior insula, inferior frontal gyrus (IFG), 14 and medial prefrontal cortex (mPFC). 15 Additionally, the prediction error is encoded in subcortical structures such as the striatum 16,17 and cortical areas such as the mPFC. 16 See Figure 1 for an illustration of reinforcement learning and related neural response in the brain. Furthermore, a great number of researchers have focused on investigating functional connectivity during reinforcement learning and/or reward processing. [3][4][5] For example, using fMRI, Camara and coworkers 3 investigated functional connectivity patterns while participants performed a gambling task-featuring unexpectedly high monetary gains and losses-and found responses for the gain and loss conditions in the insular cortex, amygdala, and hippocampus that correlated with the activity observed in the seed region of the ventral striatum. Li and coworkers 4 used a similar method and found that reduced responses of nucleus accumbens, ventromedial prefrontal cortex, and hippocampal complex to reward feedback were functionally correlated with activation of the dorsolateral prefrontal cortex.
However, questions regarding the functional integration patterns in the human brain under the modulation of learning factors, and their interactions, remain unanswered. One question is how brain regions integrate to reflect the neural joint effect of learning rate and prediction error on updating prediction. Similarly, another question is how to understand the modulation of reward feedback to learning rate and learning strategy from the perspective of brain region functional integration.
Psychophysiological interaction (PPI) analysis is a useful method to test how the temporal correlation of neural activity between brain regions is modulated by a psychological variable. Therefore, in the present study, in order to illustrate the joint effects of learning rate and prediction as well as reward feedback and learning rate, we performed PPI analysis to investigate functional interactions under the corresponding experimental conditions. Existing studies implicate the lateral/medial prefrontal areas, striatum, insula, amygdala, and parietal regions in specific aspects of decision making. [13][14][15][16] We hypothesize that integration of learning rate and prediction error, as well as reward feedback, would occur in regions that are associated with each factor, such as the medial frontal gyrus (mFG), IFG, and striatum.

| Participants
In the present study, we recruited 33 normal, righthanded college students (of which, 19 were men). All participants provided their informed consent, as required by the Institutional Review Board of the University of Electronic Science and Technology of China (UESTC). Data from eight participants were excluded for the following reasons: two owing to head movement; four for synchronization failure (the fMRI scanning and task did not start simultaneously) in at least one run; and two for poor performance where there was no response to more than half of the trials in one run. Therefore, the final sample in the fMRI analysis included 25 participants (15 men; ages 21-25 years, mean age of 22 years).

| Experimental design
Each participant underwent fMRI scanning while performing a decision task, which was projected onto a screen sited within the scanner bore. All participants had normal or corrected-to-normal vision. The participants could view the screen via a mirror mounted on the head coil, and they responded by pressing a button on one of two response boxes (one button box assigned to each hand). The presentation of stimuli, recording of responses, and external scanner trigger timing pulses were controlled using Psychophysics Toolbox Version 3 (http://psychtoolbox.org/).
In each trial, participants were instructed to choose between a red square and a green square to accumulate reward points, which determined the monetary reward they received after the task (Figure 2A). At the beginning of each trial, a fixation cross was presented at the center of the screen, along with the two colored squares (one on each side of the screen), for exponentially stepped intervals (4-5.5 seconds; step size, 0.5 seconds), during which the participants chose one square by pressing the button on the same side as the selected square. Once a response was made, the unselected square was removed from the screen. After this interval, the reward obtained (either one point or five points, corresponding to a low or high reward, respectively) was presented at the center of the screen for one second, followed by a fixation cross shown for exponentially stepped intervals (4-5.5 seconds; step size, 0.5 seconds). In the case of no response (<0.3% of all trials), no points were obtained and the message "+0" was shown. The total points obtained were displayed at the bottom of the screen throughout the task.
The decision task with a flexible learning rate has been described in detail in our previous study. 8 Briefly, this task consisted of eight runs of 40 trials each. Participants were instructed to obtain as many points as possible. Additionally, they were informed of the following: (a) in each trial, one color was more likely to lead to a higher reward than the other color; (b) the higher reward color was reset at the beginning of each run and could change during the course of a run. Unbeknownst to the participants, for any trial, the sum of the probability of either color being the higher reward color was 1. This F I G U R E 1 Diagram of the reinforcement-learning model estimates and related neural response in the brain. The first line represents the fMRI signal of one subject at one run. The second line represents the modulation of reward feedback to fMRI signal and related neural response. The stripe spacing in the figure represents each trial. There are a total of 40 trials in one run. In the same way, the third line and fourth line represents the modulation of prediction error and learning rate to fMRI signal, respectively [Color figure can be viewed at wileyonlinelibrary.com] constraint was used to maintain the chance of obtaining a high reward at 50%. To create a wide range of probability that either color was the more rewarding one, the two following conditions were created (four runs for each condition; the order of runs was balanced among participants): (a) in a "difficult" run, the underlying probability of obtaining a high reward by selecting the red square changed every four trials, either in the order of 0.2, 0.4, 0.6, 0.8, 0.6, 0.4, 0.2, 0.4, 0.6, 0.8, or its reverse. (b) In an "easy" run, the probability of obtaining a high reward by selecting the red square remained at 0.2 or 0.8 (two runs for each probability) throughout the run. This design ensured that the mean probability of obtaining a high reward by selecting either color was constantly 0.5 (ie, the two colors were equally likely to be the high rewarding one) across this task. Before beginning, all participants read the instructions for the task and performed a practice run to ensure that they understood the task. After the scanning session, the participants received monetary compensation, which consisted of a fixed amount for participation in the study plus an additional award based on the score of a randomly selected run.
To simulate how participants learn which color leads to a better chance of receiving a high reward, we adapted the flexible control model, which has been shown to capture the flexible learning of control demands in a changing environment. This model had the following five variables ( Figure 2B): (a) the (volatility-driven) flexible learning rate (α), which quantifies the model's (or participant's) belief concerning the rate of color change corresponding to obtaining a higher reward; (b) the probability of the red square leading to a high reward (p); (c) the observed color selection (s; either 0 or 1, corresponding to green or red, respectively); (d) the observed reward (r; either 0 or 1, corresponding to a low or high reward, respectively), and (e) the observed outcome (o), which is determined by s and r and encodes whether the selection resulted in the expected outcome. The trial-by-trial color selection (s) and obtained reward (r) were submitted to the model (see below) to generate a trial-by-trial estimate of α and p. The choices of equations and processing steps have been validated in our recent study. 8 For each trial, we could directly observe the prediction error, denoted δ, which is defined as the distance between the previous prediction and outcome. We could also observe the update, defined as the subsequent shift in the participant's prediction. These two variables together provide trial-wise estimates of the participant's learning rate. We implemented these procedures using MATLAB R2013a. The scripts are available on request.

| Image acquisition
Experiments were performed using a GE MR750 3.0 T scanner (General Electric, Fairfield, Connecticut) at UESTC. The anatomical images were scanned using a T1-weighted axial sequence parallel to the anterior commissureposterior commissure line. Each anatomical scan had 156 axial slices with the following parameters: spatial resolution = 1 mm × 1 mm × 1 mm; field of view (FOV) = 256 mm × 256 mm; and time repetition (TR) = 8.124 ms. The functional images were scanned using a T2*-weighted single-shot gradient echo planar imaging (EPI) sequence with a TR of 2 seconds. Each functional volume contained 43 axial slices with the following parameters: spatial resolution = 3.75 mm × 3.75 mm × 3.3 mm; FOV = 240 mm × 240 mm; echo time = 28 ms; and flip angle = 90 . Each fMRI run lasted for 416 seconds (208 TRs). During image acquisition, head movement was monitored in real time using software that integrated data from the MR scanner system. When head movement exceeded 2.5 mm or 2.5 within a run, scanning for that run was restarted using a new trial sequence.

| Image preprocessing
The data were preprocessed using SPM8 (www.fil.ion.ucl.ac. uk/spm/software/spm8). The first four time points of each run were discarded. The remaining functional images for each participant were realigned using rigid-body transformation to correct for head movements to the mean functional image using seventh-degree B-spline interpolation, and were then unwarped, and coregistered to the anatomical T1-weighted MRI image using a normalized mutual information function. Next, structural images were segmented and normalized into a common stereotactic space [Montreal Neurological Institute 152 T1-template]. Subsequently, the normalization parameters were applied to the functional images, and these were resampled to 3 × 3 × 3 mm 3 isotropic voxel seventh-degree B-spline interpolation. Finally, the resampled functional volumes were smoothed using a Gaussian kernel (FWHM = 8 mm).

| Psychophysiological interaction analysis
Conventional functional connectivity analysis provides information about how spatially separated brain areas are temporally correlated with each other. PPI analysis 18 further tests how the temporal covariance of neural activity between brain regions is modulated by a psychological variable. That is, PPI analysis identifies how the influence of A (source region 1) on B (source region 2) is altered by the experimental context or task, based on regression models.
PPI analyses identify voxels in which activity is more related to activity in a seed region of interest (seed ROI) in a given psychological context, such as during a particular behavioral task. 19 In other words, a PPI analysis aims to identify regions of which their activity depends on an interaction between psychological factors (the task) and physiological factors (the time course of a region of interest). A task-specific increase in the relationship between brain regions (a PPI effect) is suggestive of a task-specific increase in the exchange of information. In the present study, we used PPI analysis to identify functional connectivity patterns of reward learning under the psychological context of r, α × r, and α × δ. We referred to the psychological context as the participant receiving reward feedback (ie, r), the interaction of learning rate and reward feedback (ie, α × r) reflecting the modulation of reward feedback to learning rate, and the interaction of learning rate and prediction error (ie, α × δ) reflecting the updating of learning. The PPI models were estimated in the following steps.
A general liner model (GLM)-based analysis was first performed on the preprocessed fMRI data at each voxel, for each individual. This GLM consisted of the following five regressors, time-locked to the onset of the reward feedback in each trial: the stick function, the reward feedback (r), the signed reward prediction error (δ or r − p), the updating in learning (α × δ), and the interaction between reward (r) and α (ie, α × r), which accounted for the behavioral pattern when participants increased their learning rate after receiving a high reward. All regressors were concatenated across the eight runs of this experiment.
We first divided the brain into 90 regions of interest (ROIs) by using the automated anatomical labeling (AAL) template regions. 20 We then extracted individual average time series of blood-oxygenation-level-dependent (BOLD) activity within each ROI from the resulting GLM corresponding to the model estimate of interest. The resulting time courses were convolved with canonical hemodynamic function. A new GLM was then applied with the following regressors: interaction between BOLD activity in the ROI and an indicator function for the model estimate of interest; and the original BOLD time series in the ROI. These regressors were also convolved with a canonical hemodynamic function such that the observed BOLD signal would be a linear combination of these regressors. The individual beta values from the PPI analysis were used in a second level analysis of group effect and tested against the estimated model parameters.

| Statistical analyses
A correction for multiple comparisons was applied using false discovery rate (FDR) across all voxels showing the main effect of repetition in each experiment (at a voxelwise P-value threshold of <.05). 21 For the behavioral data, we conducted repeated measures ANOVAs and two-tailed t tests (significance was set at P < .05), using IBM SPSS (IBM Corp., Armonk, New York) to analyze the behavioral and model data.

| RESULTS
We performed PPI analysis to investigate the functional interactions of the whole brain-under the experimental condition of reward, the integration between reward and learning rate, and the integration between the prediction error and learning rate-in the 25 participants as they performed a decision task. In each trial, the participants were instructed to choose between a red square and a green square to accumulate reward points, which determined the monetary reward they received after the experiments were complete (see Figure 2A). Importantly, at each moment, one color had a better chance of leading to a high reward than the other color (the sum of the two probabilities was always 1). To maximize reward, the participants must learn which color was more rewarding. To create a wide range of belief regarding the more rewarding color, the task contained to conditions. In the "easy" condition, the more rewarding color and its probability of yielding high reward remained constant at 80% throughout a run. Conversely, in the "difficult" condition, the more rewarding color and its probability of obtaining high reward varied across time (from 20% to 80%, step size = 20%). Across the whole task, each color had a 50% chance of being the more rewarding color.

| Behavioral results
To test whether participants followed the task instructions for learning to choose the color with the higher reward, we conducted one-sample t tests. Participants chose the higher rewarding color more frequently than chance level (ie, 50%) under two conditions of difficulty (easy condition: 69.23 ± 1.1%, t (24) = 18.11, P < .001, one-sample t test; difficult condition: 52.38 ± 1.0%, t (24) = 2.48, P < .021, one-sample t test; Figure 3A). These outcomes indicated that participants followed the task instructions to learn the more rewarding color. Additionally, participants chose the higher reward color more frequently under the easy condition than under the difficult one (t (24) = 10.35, P < .001, paired t test; Figure 3A); this suggests that manipulation of the difficulty of learning to choose the high reward color was successful.
To further probe how participants adjusted their choice of color based on the reward obtained, we tested the frequency of participants repeating their previous color choice. Because the higher rewarding color was more likely to remain the same as that in the previous trial, it could be expected that participants would repeat their choices more than at chance level. As expected, the overall frequency of repeated choice was significantly higher than chance (74.19 ± 9.5%, t (24) = 12.73, P < .001, one-sample t test). To further test the difference in the frequency of choice repetition among the experimental conditions, we conducted a repeated measures 2 (obtained reward: high/low) × 2 (difficulty: easy/difficult) ANOVA. This ANOVA revealed a significant main effect of the reward obtained [F(1, 24) = 116.15, P < .001; Figure 3B], driven by a higher frequency of repeating the previous color choice after receiving a high reward (55.26 ± 6.8%) than after receiving a low reward (18.93 ± 6.1%); this suggests that a high reward reinforced the participants' belief that the previously selected color was the higher rewarding one. The main effect of difficulty was also statistically significant [F(1, 24) = 440.45, P < .001], driven by a higher likelihood of repeating the previous color choice under an easy condition (80.83 ± 9.1%) than a difficult condition (67.55 ± 10.8%). This difference indicated the fact that the higher reward color changed more frequently in the difficult condition than in the easy one. The reward type × difficulty interaction was also significant [F(1, 24) = 106.68, P < .001].

| Model-based behavioral analysis
The flexible learning model generated trial-by-trial estimates of α and δ, which drove the learning of the more rewarding color ( Figure 4A). To test whether the learning rate (α) and prediction error (δ) reflected behavioral patterns, we conducted the following repeated measures ANOVA analysis: 2 (obtained reward: high/low) × 2 (difficulty: easy/difficult) ANOVA on the absolute values of δ and α, respectively.
The first ANOVA revealed a significant main effect of the reward obtained [F(1, 24) = 34.79, P < .001; Figure 4B], whereby high rewards (0.21 ± 0.002) were associated with lower δ estimates than low rewards (0.22 ± 0.003). The main effect of difficulty was also statistically significant [F(1, 24) = 231.36, P < .001; Figure 4B], driven by lower δ estimates under the easy condition (0.39 ± 0.004) than the difficult condition (0.47 ± 0.004). This difference possibly implied the fact that the higher rewarding color changed more frequently in the difficult condition than in the easy one. Additionally, we found a significant interaction between the reward type and the difficulty [F(1, 24) = 59.98, P < .001] in the δ estimates.
The second ANOVA analysis revealed a significant influence of the reward obtained [F(1, 24) = 333.81, P < .001; Figure 4C], whereby high rewards (0.162 ± 0.002) were associated with higher α estimates than low rewards (0.156 ± 0.002). The difference possibly reflected the fact that an increased learning rate followed a high reward. The main effect of difficulty was also statistically significant [F(1, 24) = 8.50, P < .008; Figure 4C], driven by higher α estimates under the easy condition (0.059 ± 0.001) than the difficult one (0.100 ± 0.002). Additionally, we found a significant interaction between the reward type and the difficulty [F(1, 24) = 113.30, P < .001] in the α estimates.
Taken together, these results suggest that the flexible learning model estimates of α and δ captured the behavioral patterns and, therefore, provided meaningful learningrelated information for the following imaging analyses.

| PPI analysis results
In the behavioral results, we showed that the reward feedback (high or low reward) affected the subsequent strategy of color selection. These results also revealed that the learning rate and prediction error mediated the learning strategy, demonstrating that the reward feedback mediated the learning rate, and the learning rate further drove the reward prediction error to update the future decision. To study the brain connectivity patterns of the model estimates and their interactions, we conducted PPI analysis under the experimental context of r, α × r, and α × δ. The reward-modulated PPI analysis results showed that the interactions (ie, functional connectivity) between the right orbital part of the middle frontal gyrus (MFG) and the ipsilateral medial frontal gyrus (mFG)-as well as between the left postcentral gyrus and the right superior temporal gyrus (STG)-were altered by reward feedback (see Figure 5A, Figure 6A, and Table 1).
The key imaging analysis in this study concerned how the flexible learning rate interacts with reward feedback and prediction error. The behavioral analysis revealed an increased learning rate following a high reward. Accordingly, we tested the interaction between the updated learning rate and the reward feedback. PPI analysis results of the integration of the reward and learning rate showed that the altered interactions between the right precentral gyrus and MFG, supplementary motor area (SMA) and middle temporal gyrus (MTG), left medial superior frontal gyrus (mSFG) and Brodmann area 39 (BA39)/STG, right insula and MTG, right ACC and MTG, right middle cingulate cortex (MCC) and BA40, and right thalamus and IFG were modulated by the integration of the reward feedback and learning rate (see Figure 5B, Figure 6B, and Table 2).
PPI analysis results of the integration of prediction error and learning rate showed that the altered interaction between the left caudate and left putamen, as well as between the right mSFG and ipsilateral mFG were modulated by the integration of prediction error and learning rate (see Figure 5C, Figure 6C, and Table 3).

| DISCUSSION
In our present study, we applied a reinforcementlearning model with a flexible learning strategy (ie, learning rate) to simulate human behaviors in a rewardlearning task. All participants successfully followed the instructions suggested in the model. In addition, we performed an fMRI-based PPI analysis to examine wholebrain functional interactions with r, α × r, and α × δ. Altered functional connectivity with the mFG was found in all three PPI analyses, which suggests that the mFG serves as a key hub in cognitive processing. The integration between reward and learning rate engaged the decision control and motor processing pathways, suggesting an integration of decision control and motor information during cognitive decision making. Finally, as expected, integration between the reward and prediction error implicated a pathway related to the striatum, which has been reported as encoding the prediction error in previous studies. 8 Our behavioral analysis revealed an increased learning rate and frequency of repeating a color choice after receiving a high reward. These observations are consistent with the existing understanding that reward feedback dominates cognitive decision making and mediates learning rate. 22 All PPI analyses of the three model estimates of interest (ie, learning rate, and its integration with reward and prediction error) showed altered functional connectivity with the mFG. This suggests that the mFG may serve as a key hub that is integrally involved in cognitive decision making. The mFG has been found to be involved in cognitive functions in fMRI analysis, 23 as well as in electroencephalogram studies. 24 Regions correlated with the mFG include the insula, MFG, and mSFG. These regions are mainly located in the frontal cortical networks, which govern executive and cognition functions such as decision making 25 and control monitoring. 26 The insula has been also found to play a role in motor control and cognitive functioning. 27 Engagement of the mFG and these cortical regions during participants' performance of a reward task suggest that these pathways may encode the process of cognitive function. The PPI analysis of reward and learning rate interaction revealed an integration of decision control and motor information during cognitive decision making. The results showed a large of altered functional connectivity in cognitive control areas (including the MTG, ACC, MCC, and IFG) and motor-related regions (including the SMA, precentral gyrus, insula, and thalamus). A previous study indicated that decision making is involved in the neural processes of incorporating new information, selecting action, and central execution such as speed-accuracy tradeoffs or decision caution. 28 This evidence suggests that the interaction of reward and learning rate engages the decision hierarchy (frontal and temporal cortices) and motor information processes. Additionally, it is noteworthy that most of these regions are connected with the MTG, which is related to language-related cognition such as syntax comprehension and syntactic processing. 29 This region may serve as a central station for reward evaluation and processing because it is functionally connected with not only motor cortices (SMA, precentral gyrus, and insula) but also cognitive control areas (ACC, MCC, and IFG).
Previous fMRI studies have located brain structures, such as the striatum and mFC, encoding prediction error in humans. 30 Interestingly, our PPI analysis of the interaction between prediction error and learning rate revealed functional connectivity between the left caudate and putamen, as well as the right mSFG and ipsilateral mFG. Both the caudate and putamen are parts of the striatum, which are anatomically and functionally interconnected. 31 Our results further confirm the notion that subcortical regions (striatum) and cortical regions (medial prefrontal cortex) are involved in the reward pathway of the brain.
One limitation of this study is that the task used only two levels of reward magnitude, which resulted in a correlation between reward magnitude and reward prediction error. In addition, given that the goal of the present study was to maximize reward, obtaining a low reward can be seen as the outcome of an "incorrect" response, which may further drive learning through prediction error.

| CONCLUSION
In conclusion, we applied fMRI-based PPI analysis to investigate the functional connectivity patterns of model estimates and their interaction in a flexible learning model. The findings suggest the following: (a) the mFG may serve as a key hub in the processing of cognitive decision making; (b) the interaction between reward and learning rate integrate decision control and motor information during cognitive decision making; and (c) the interaction between reward and prediction error involves a striatum-related pathway. This study reports the first probe of functional patterns of model estimates and their interaction in a reinforcement-learning model using a flexible learning strategy.

DISCLOSURE OF INTERESTS
None.

AUTHOR CONTRIBUTIONS
T.W. performed the data analyses and wrote the manuscript. X.W. contributed to the conception of the study. J.J. contributed to the interpretation and discussion of the analysis results. C.L. contributed to the data acquisition. M.Z. contributed to the research design and draft revision. All authors reviewed the manuscript.

ETHICS STATEMENT
This study was performed in accordance with the recommendations of the Institutional Review Board of Chengdu University of Information Technology, with written informed consent from all subjects.

DATA AVAILABILITY STATEMENT
The datasets analyzed in the current study are not publicly available because further analysis of the datasets is currently being conducted in our research. However, the data are available from the corresponding author upon reasonable request.