A Visual Feedback Supported Intelligent Assistive Technique for Amyotrophic Lateral Sclerosis Patients

Among diverse intelligent assistive systems developed for amyotrophic lateral sclerosis (ALS) patients, headwear eye tracking based ones trigger broad interests due to their merits such as noninvasive, cost effective, and high operation freedom. However, with headwear eye trackers, patients are easy to feel tired during human–machine interactivities (HMIs), and the operation accuracy is not satisfied compared with its counterparts. To address these two issues, herein, a visual feedback technique is developed which allows users to recognize machine's vision by positioning a laser spot to the user watched object, according to the location information interpreted from user's eye movement. Through the visual feedback technique, users not only obtain real‐time feedback, but also can fine‐tune the laser spot to the desired location before performing further operations. Experimental results demonstrate that the presented work can successfully reduce user's fatigue and boost operation accuracy by 25.1% and 27.6%, respectively, therefore, advancing the field of intelligent assistive technologies.


Introduction
Amyotrophic lateral sclerosis (ALS) patients receive global attention due to their unique and horrible character that making patients gradually and irreversibly lose their abilities in controlling muscles. [1][2][3] During the disease development, patients are fully conscious, but cannot interact with the surrounding environment like healthy people. Hence, computer-assistive interactive (CAI) techniques are developed for making both ALS patients' life and releasing their mental pain convenience. [4][5][6] Among diverse CAI systems, eye-tracking-based architectures are broadly welcomed, owing to their merits of noninvasive, cost effective, and high accuracy; [7,8] meanwhile eye movement is one recognized with the help of visual feedback function, users can fine-tune the laser spot location before giving further commands, therefore high operation accuracy can be reached. The experimental results validate that the presented technique can successfully remove user's fatigue and boost operation accuracy. Details are provided in the following sections.

Methodology
In this section, both hardware architecture and algorithm development are explained. The former contains an eye-tracking system and a visual feedback system, and the latter involves methods for interpreting user's intention.

Eye Tracking
As shown in Figure 2a, the eye-tracking system consists of glasses frames, an eye camera (LRCP-10190), an infrared light source (XUV-88IR), a scene camera (HBV-1466), an antislip rubber sleeve, and two USB cables. The eye camera is attached to the right leg of the glasses via a 3D-printed bracket and is about 8 cm in front of the right eye. The scene camera is secured to the left leg of the glasses by tie wraps. Both cameras are used to capture images and transmit data to the computer through USB cables. The parameters of the eye tracker are listed in Table 1.
Mainstream fixation estimation techniques include pupil central corneal reflex (PCCR), [20,21] cross-ratio mapping (CRM), [22,23] and neural network-based methods. [24,25] Among them, PCCR is an image processing technique to extract the pupil center and Purkinje image under infrared light irradiation, [26] and then fits the fixation point position by a second-order polynomial; CRM uses four IR LEDs to form a spot quadrilateral on the cornea, and the mapping from eye coordinate system to scene coordinate system can be realized by the cross-ratio invariance; neural network applies a two-way network architecture to train datasets, which can approximate the line of sight state transition model. In our application scenario, PCCR is more preferred, as CRM requires complex hardware architecture, which limits its integration and transplantation. Neural networks require users to provide large training datasets, which is impractical for patients. Detailed fixation estimation based on PCCR is given below (shown in Figure 2b).
First, the images obtained from the eye camera are processed by Gaussian filter and grayscale. Second, the region of interest containing the pupil is extracted to reduce the amount of calculation. Since the threshold is usually a free parameter that can be affected by changes in lighting or camera settings, an adaptive threshold algorithm based on neighborhood average is adopted to segment the gray images of human eyes, [27] and the eye contour is extracted by Canny edge detection. [28]    Considering that most of the pupils captured by the eye camera are elliptical, we determine the pupil center point and pupil radius roughly through Hough circle transformation, and then fit the pupil contour in this area through the least-squares ellipse fitting method to achieve accurate positioning of pupil center coordinates. [29,30] Generally, the Purkinje image is located around the pupil region and has the highest gray value, therefore the preselected area can be delimited and the binarization threshold can be adjusted to effectively locate the Purkinje image. The extraction result of P-CR vector is shown in Figure 3a.
After attaining the precise location of pupil center and Purkinje image, we use the second-order polynomial regression to locate the fixation point. [31] The direction of fixation is estimated by the mapping relationship between the P-CR vector (u, v) in the pixel coordinate system of eye camera and the fixation point (x, y) in the pixel coordinate system of scene camera. The mapping function can be obtained by calibration and quadratic polynomial fitting, [32] the formula is as follows From Equation (1), it can be learned that at least six calibration points are needed. However, normally nine points are used for obtaining higher precision. [33,34] The calibration experiments are carried out in the laboratory environment. When a subject's head is still, he/she is asked to gaze at the nine fixed points at the wall (3 m away from the subject), and the machine records the P-CR vector in sequence. The measured data are fed into Equation (1), and the unknown parameters are calculated by the least-squares fitting method. Based on the mapping function, we only need to obtain the P-CR vector in the current eye image, and then the eye fixation position can be real-time interpreted in the scene image, as shown in Figure 3b.
In the verification test, we carry out 30 experiments for each subject, obtaining 270 fixation points (shown in Figure 3c) calculated by the PCCR algorithm. The corresponding error distribution of estimated points is shown in Figure 3d. The results demonstrate that the calibrated eye tracker offers a low angle estimation error of average 0.92 and 0.82 in the horizontal and vertical directions, respectively.
It should be noted that besides the eye-tracking algorithm explained above, an eye gaze detection algorithm is also needed which is for locking the fixation point. The algorithm is based on human eye gaze behavior which is defined as the duration time of the fixation point in the target area exceeding the preset dwell time threshold. Generally, the dwell time threshold is set to 800 ms, while the range of target area relative to the scene camera is less than 1 .

Visual Feedback
The visual feedback system aims to allow users to know which object is being watched by the machine. To implement this, a laser-based system is developed. When the eye-tracking system determines the user's focus, the laser beam will be pointed to the target. Below, the visual feedback system is explained.

Hardware Architecture
The laser-based visual feedback system consists of a laser, a horizontal gear, a vertical gear, and a scene camera (the one integrated in the eye-tracking system). The axis gears allow the laser to be rotated freely in two directions. The scene camera is used to build up a pixel coordinate system for the objects and the laser spot, which is necessary for the gear to control the laser's orientation. The whole system setup is shown in Figure 4a.
The eye-tracking system first provides a location to the gears, based on the pixel coordinate system, and then the gear will adjust the laser spot to the desired position. To successfully implement this, the gear needs to know the current position of the laser spot, hence, a laser spot detection method is developed, which is explained below.

Laser Spot Detection
The coordinates of the laser spots should be real-time transmitted to the computer during the tracking process, indicating that both dynamic and static laser spot detection should be supported. Considering that the characteristics of laser spots under these two statuses are different, two algorithms are developed.
When the laser spot is static, it shows the following characteristic: the brightness decreases (normally following a Gaussian curve) from the center to the margin. [35] This kind of laser spot can be extracted from the background image by using the watershed transform (WST), [36] as explained below: 1) Perform binarization operation on the image based on continuous stepping thresholds. 2) Extract the connected regions formed under each threshold based on the contour finding algorithm, and the obtained connected regions are considered as spot candidates.
3) Set a selection standard, for spot candidate whose parameters (area, circularity, and inertia ratio) meet the standard, calculate its center pixel coordinate.
For dynamic laser spots, their circularities and inertia ratios will change greatly, which cannot be captured by the static spot detection algorithm. Therefore, an interframe difference based algorithm is proposed to extract the high-speed moving laser spot, by comparing the pixel changes between successive frame images. [37,38] For a pixel position (i, j) in the scene image, its grayscale values in frames k and k þ 1 are G k (i, j) and G kþ1 (i, j), and then a subtraction is performed as shown in Equation (2).
where T is the threshold value for binarization processing of different images. The movement of laser spot will make the gray value of some pixels change greatly in two successive frames of images, so we can get the position of the dynamic laser spot by extracting the coordinates of the points with strong gray-level change. To suppress the noise interference of the background environment, we experimentally determine the value of T. Through many experiments, it is found that when T ¼ 80, the dynamic laser spot location achieves an excellent effect. Figure 4b shows the positioning and tracking results of static and dynamic laser spots. The laser spot location information is then conveyed to the steering gear pan-tilt system, for further positioning the laser orientation. www.advancedsciencenews.com www.advintellsyst.com

Steering Gear Pan-Tilt Control
The servo control system receives the laser spot location (x l , y l ) and fixation point position at the pixel coordinate (x f , y f ) from the computer by using a STM32 interface. As the spatial relationship between the scene camera and the steering gear pan-tilt is not fixed, the error between two locations in the pixel coordinate cannot be compensated by using a fixed mapping matrix. Hence, a PID control algorithm is developed to align the laser spot to the fixation point. The conceptual depiction of the servo control system is given in Figure 4c. The projection of the 2D distance (which is referred to as the input error in the PID technology) of the two points in the pixel coordinate system can be expressed as where e x and e y indicate the difference between the two points at x-and y-axis, and they are employed as the input of PID control, to minimize the error between the laser spot and fixation point. PID control algorithm involves three paralleled processing steps, which are proportion, integration and differentiation. [39] Among them, the proportion controls the adjustment step length, the integration affects the steady-state error, and the differentiation improves the dynamic response of the system. As the fixation point is merely slightly altered when the user stares at the target object, hence the differentiation step is not used.
In this article, the steering gear is controlled by pulse width modulation (PWM), in which the angle of the actuator is controlled by the duty cycle. [40,41] As we need to adjust the orientation at a 2D surface, PWM is actually an 1 Â 2 vector [PWM x , PWM y ], each element can control the steering gear to rotate from 0 to 180 . In our system, we set 200 values for each direction, indicating that the gear's rotation resolution is 0.9 . As explained above, the distance between the eye tracker and steering gear is dynamically changed. Hence, we cannot use a fixed PWM vector. Alternatively, we develop an adaptive tracking algorithm as below where PWM i-new and PWM i-pre are the PWM vector of the next move and previous move, u i is the change of PWM for the next move, whose value is given by where k p1 and k p2 are proportion coefficients, k i is integration coefficient, e M and e m are maximum and minimum thresholds.
When | e i | is greater than e M , the proportional with a larger coefficient is used for control. When e m < |e i | < e M , small proportion control and integration control are used to accurately approach the target. When |e i | < e m , servo control is stopped. In the proposed system, we use a total of 200 numbers to control the steering engine's rotation angle from 0 to 180 , and experimentally determine k p1 , k p2 , and k i to be 0.05, 0.01, and 0.003, respectively. With the above parameters, the tracking path diagrams and the coordinate error curves of six randomly chosen initial points (end point is the coordinate (320, 240)) are shown in Figure 5a.
The error between the laser spot and the target point after tracking is within 3 pixels for both x-and y-axis. Errors from 50 experiments are given in Figure 5b. Since the scene camera is of 100 wide angle and the image resolution is 640 Â 480, when the operating distance is 2 m, 3 pixels will result in 0.5 cm mismatch, which is neglectable for daily use. The experimental results show that the laser tracking system controlled by the steering gear pan-tilt performs well in operation speed and accuracy.
From eye motion detection to laser beam positioning, there will be inevitable system error, indicating that the laser spot may surround the targets, instead of directly appear at them. In such cases, users can employ fine-tuning function to control the laser spot to move to the targets. More specifically, when the laser spot remains stationary after tracking, the user can know the focusing position determined by the machine. If the user  believes that the machine's vision is different from his/hers, the user can employ eye control-based fine-tuning function, before the desired actions (e.g., turn on a light) which are also enabled by eye control.

Eye Control
Eye control function consists of eye blinking detection and eye status classification. The former is used for choosing operation mode while the latter is employed for laser spot location adjustment.

Blink Detection
When a person blinks, the change of pupil area is positively associated with the open degree of eyes. [42] Thus, we can detect blinks by monitoring the pupil area S, which is calculated as below: 1) Grayscale and blur eye image first, and then enhance image contrast by using the adaptive threshold algorithm based on neighborhood average. [27] 2) Extract the pupil edge with the Canny operator. [28] 3) Morphological expand the eye image and connect the edge of the unclosed pupil by using a 3 Â 3 matrix. 4) Find the maximum closed-loop contour and S is defined as the number of pixels surrounded by its edge.
In each frame, we can have a S value. When consecutive frames are obtained during monitoring eye images, a S curve can be depicted to describe the change of S over time. We first search each regional minimum point of the S-curve, and then eliminate interruption points by setting up a reasonable area threshold of the closed pupil (T ) and a time interval between adjacent blinks (I). [43] Considering that S close will not exceed 1000 and blinking usually lasts about 150-300 ms, [44] each eye blink can be accurately identified by setting T to 1000 and S to 0.3, as shown in Figure 6a.
Blinking events can be divided into conscious blinking and unconscious blinking, [45] indicating the two blinking types need to be differentiated during HMIs, which can avoid "Midas touch" problems. [46] In this article, the time interval between blinks is used for recognizing conscious blinks. As intervals between unconscious blinks are normally longer than 1 s, we treat blinks with intervals less than 1 s as conscious ones.

Eye Status Classification
The eye blink-based interaction offers limited information which gives rise to difficulties in complex interactive scenarios. To address this issue, six eye statuses (i.e., up, down, left, right, normal, and close, as shown in Figure 6b). As different people possess diverse eye status characters (e.g., some patients suffer from strabismus), eye status detection should be customized to specific users. Therefore, a deep learning method based on convolutional neural networks (CNN) is developed. [47,48] It should be noted that the amount of training data for eye status classification is much less than fixation estimation, hence neural networkbased technique will not bring inconvenience to patients during practical use.
First, we take 2100 eye images (resolution at 640 Â 480) of the same subject at an indoor environment for six eye statuses (350 for each). Before feeding the images into the CNN, a preprocessing step is taken to release the calculation workload, by converting the collected RGB images into grayscale and normalizing them to a fixed resolution of 64 Â 64. For enlarging the size of datasets while not increasing subjects' burden, a data upgrading method is used through random rotation (with a range of AE20 ), shift (with a range of AE0.2 image width/height), and contrast (with an exponential factor between 0.25 and 4) operations. After data upgrading, in total 8400 images are obtained, which are divided into training and testing groups, with the ratio of 7:3.
The structure of CNN is shown in Figure 6c. The network used here consists of eight layers: one input layer, four convolution layers, two pooling layers, one full connection layer, and one output layer.
We use the softmax classifier to predict the probability p k (k ¼ 1,2,…,6) that the eye image belongs to each status by the following formula where w j is the parameter learned by the back-propagation algorithm. [49] Softmax defines the loss function (L) in terms of crossentropy as where t ij represents the true probability that sample i belongs to the category j, and y ij represents the prediction probability of the model. The training objective is to minimize the loss of cross-entropy. Since the loss function is convex, the momentum gradient descent method is employed for training optimization. It calculates an exponentially weighted average of the gradients and then uses it to update the weights. [50,51] When the optimization is completed, CNN outputs the status of the eye image in real time according to the maximum probability. The algorithm is used to train the model for 40 iterations, the classification accuracy, loss curves, and confusion matrix of the dataset are shown in Figure 6d,e. The accuracy of eye status classification is 98.84%, while the cross-entropy loss value is 0.0469. The results show that CNN can recognize eye status images effectively.
To avoid "Midas touch," we develop an interactive method based on eye status detection, to map four different eye status combinations into corresponding control commands. The mapping relationship is given in Table 2, in which C S represents the initial eye status, C M the transition eye status, and C E the end eye status of the behavior. When a certain eye status combination is detected, the corresponding control command will be triggered, and the fixation point in the pixel coordinate system of scene camera will translate 5 pixels to the desired direction accompanied by the automatic tracking of laser.
The system described above needs to cooperate with action functions in practical scenarios. Below in this article, we employ www.advancedsciencenews.com www.advintellsyst.com a robotic arm developed in our previous work for collaborating with the visual feedback system to grasp objects for users. To achieve this, a subject will use the fixation detection function, laser feedback function, and eye control function explained above. More specifically, after the system detects the fixation point of the subject, a laser beam will be pointed in that direction; and then the user can recognize if the machine's vision is the same as his/hers or not. If not, eye control-based laser beam fine-tuning will be enabled. Finally, the robotic arm will grab the object when the subject satisfies with the position of the laser spot.  www.advancedsciencenews.com www.advintellsyst.com

Experimental Section
For evaluating the effect of visual feedback, a robotic arm-based human-machine cooperation system is constructed, which two experiments are designed and a questionnaire study is performed. The overall framework of the interaction is shown in Figure 7.

Environment Construction
All experiments are carried out in the laboratory environment.
The entire system includes an eye tracker, a steering gear pan-tilt, a robotic arm, and a computer. The robotic arm is installed on the desk, while other components are integrated with a wheelchair. Four blocks representing routine objects are distributed on the desk. During the experiment, the subjects sit on the wheelchair. The whole setup is shown in Figure 8a.

Task Design
The subject needs to complete the task of grasping the specified color block, using an eye movement interaction system with and without visual feedback. When visual feedback is not in use, laser feedback, and eye status fine-tuning functions are also disabled. Three distributions based on different spacing distances between blocks are designed: case 1 (>10 cm), case 2 (5-10 cm), and case 3 (<5 cm). The aim of using three different distance ranges is to evaluate the performance of the developed technique in diverse scenarios.

Experiment Protocol
Thirty healthy subjects (details of subjects are given in Table 3) at Beihang University are recruited. Before the formal experiment, all subjects are required to be familiar with the operation method of the system, and the system is calibrated for each subject. All provided written informed consent, and all study procedures were approved by the Committee for Medical Research Ethics at the First Hospital of Shijiazhuang City, China, with the assigned project number of 2020036. The procedures of the two experiments are described as follow: 3.3.1. Experiment I 1) Grab the blocks from left to right in sequence, with and without the visual feedback system; 2) After successfully grasping all four blocks, take 5 min rest; 3) Repeat step 2 for 10 times; Adjust the block spacing and repeat steps 2 to 4; Conduct a predesigned questionnaire survey.

Experiment II
1) Set the block distribution to case 3; 2) The subject operates the robotic arm to grab the block freely by using the system with and without visual feedback; 3) After all four blocks are successfully grabbed, refresh blocks; 4) Repeat steps 2 and 3 contentiously for grabbing blocks as many as possible until they report suffering fatigue; 5) Record the number of successfully grabbed blocks every 3 min; 6) Conduct a questionnaire survey.

Results and Discussion
In this section, quantitative (interaction time and success rate) and qualitative (subjective feeling represented by questionnaire survey score) results are provided to comprehensively evaluate the influence of the visual feedback technique in humanmachine collaboration. The interaction time is defined as the average time used to complete a block grabbing action successfully, while the success rate is the number of successful attempts over all attempts. From experimental results, we also learn two phenomena. First, background diversity does not demonstrate a significant statistical difference in interaction time and operation accuracy. Second, age has a strong influence in the above mentioned two indexes. One example is given in Figure 8d,e, which depicts the data of the case 3 of Experiment I. It is observed that senior people performed not as good as students in terms of both interaction time and accuracy, while students in different disciplines  Table 4). Nevertheless, it should be noted that the senior are satisfied with their performance and the user experience with the system, through questionnaire answers below.
In terms of the questionnaire survey I shown in Table 5, the average score of all problems with visual feedback eye movement interaction system is 3.95, which is 0.86 points higher than that of without using the visual feedback system. From the result, we find that the subjects agree that visual feedback plays a positive role in the eye movement interactive system. This is because without visual feedback, the invisibility of system state increases subjects' doubts and cognitive burden. In contrast, the subjects can get a clear gaze position through visual feedback during the interaction process, which provides confidence of interactive operation for users. The results also show that the subjects generally believed that the eye movement interaction system with visual feedback can better meet the comfort needs of users and reduce user's fatigue by 25.1%.

Experiment II
The number of blocks captured by the subjects in different time periods is shown in Figure 8d. Our findings suggest that visual feedback can eliminate the fatigue of the subjects to a certain extent, so that they can concentrate longer time in the interactive experiment, improving both efficiency and accuracy. Considering senior people's physical conditions are not as good as students, they are prone to suffering fatigue, explaining the curve trends in Figure 8f.
From the questionnaire survey II, we can almost make the same conclusion as the questionnaire survey I. However, it demonstrates that under pressure, users are more preferred to have a visual feedback function.
During both experiments, we find out that the frequency of using fine-tuning function is highly user dependent. Through the developed technique, once the laser spot is shown at the object surface, and then the vision-based technique can recognize the object's position (this has been informed to the subjects). Nevertheless, some users prefer to align the laser to the center position of the objects, while others may stop the fine-tuning step after seeing the laser spot at the edge of desired objects. From our record, for successfully grabbing an object averagely 2.75 times fine-tuning actions need to be taken, with a variance of 1.03 times.

Conclusion
ALS patients suffer from unimaginable mental pain, hence receive increasing attentions and cares. The work presented in this article showcases a feasible means for ALS patients to effectively communicate with surrounding intelligent devices and systems. Compared with conventional desktop eye-tracking and electroencephalogram (EEG)-based techniques, the proposed visual feedback-supported intelligent system can provide an enhanced user experience, indicating its strong potential in releasing ALS patients' mental stress and making their daily lives convenient.