Assistive Magnetic Skin System for Speech Reconstruction: An Empowering Technology for Aphonic Individuals

Individuals with a voice disorder or speech sound disorder (SSD) (i.e., aphonic or mute people) can have any combination of difficulties with perception, articulation/motor production, and phonotactics, which may impact their speech intelligibility and acceptability, thus finding challenging to communicate with the public. As a result, many individuals suffer from frustration, isolation, and depression. Natural verbal communication for SSD patients is now more feasible than ever thanks to advancements in wearable artificial skins and machine learning. This article presents an assistive magnetic skin system for speech reconstruction (AM2S‐SR), which enables SSD‐afflicted people to communicate with their mouths. Using magnetic field sensors integrated into Magnetphones, the system reads the movement of the mouth by tracking the movement of magnetic skin patches attached next to the bottom lip. The measured magnetic field signals data are then processed using a Fine K‐Nearest Neighbor machine learning classifier. The classified data are then exported verbally on speakers, or visually on a display. AM2S‐SR successfully identifies the full English alphabets with average success rate of 94.96%, thus enabling SSD people to talk using the mouth and have a more natural conversation with others.


Introduction
In the United States only, approximately 18.5 million individuals have a speech, voice, or language disorder. [1]Speech sound disorder (SSD) is defined as any combination of difficulties with perception, articulation/motor production, and phonotactics, which may impact speech intelligibility and acceptability. [2]The inability to speak is a result of several different phenomena, such as a disruption in the neurological process, loss of voice injury paralysis, or a motor speech disorder characterized as a weakness or inability to control the speech muscles. [3]People with SSD encounter plenty of challenges, not only in the way they communicate with others but also self-esteem or image trauma, depression, [4] and social development. [5]hildren with SSD often do not receive schooling and those who do, underachieve relative to their peers, [6] which subsequently hampers their prospects as adults. [3,5,7,8]o communicate, SSD patients typically employ sign language interpreters or resort to lip-reading, or vocalization.Even though there is an international sign language, there are more than 300 different practiced sign languages, [7] which renders standardization impractical.In addition, the majority of public people cannot understand sign language (e.g., less than 0.2% of the population in the United States understand sign language), which makes it challenging for SSD-afflicted people to communicate with the public. [9]12][13][14][15] Existing assistive solutions for people with a speech disorder include interpreter gloves, [16] camera-based solutions, [17] and neural decoders. [18]However, these technologies offer limited words, require continuous attention, and are invasive, which are not practical for a natural conversation with the public.To enhance the experience of the existing technologies, machine learning (ML) algorithms have been integrated on cameras to decipher sign language and expressed emotions, and vocal cord monitors.However, the user still needs to be facing the camera directly for accurate hand and face detection.On top of that, the sign language is originally designed to express words (i.e., not sentences) and it does not have a well-defined, grammatical, or sentence structure.Because of these constraints, existing technologies provide inadequate solution for a natural conversation between a person with a speech disability and the general public.
Recently, wearable electronics have been implemented to enable people with disabilities, which can leverage additional benefits and new opportunities.For example, Almansouri et al. employed magnetic skin system to track facial expressions of people with quadriplegia in order to help them move around and control their surroundings individually. [19,20]CUR Smart Pain Relief [21] and Valedo [22] used 3D gyroscopes, accelerometers, and magnetometers embedded into wearable sensors to treat skeletal system diseases, such as osteoporosis or join disorders, employing transcutaneous electrical nerve stimulation.These technologies are functional, user-friendly, and customizable for people with impediments. [23]ith the current advancements in artificial skins and wearable electronics, an efficient, functional, and comfortable solution for SSD patients is more viable than ever.In this work, an assistive magnetic skin system for speech reconstruction (AM2S-SR) for SSD-afflicted individuals is implemented.The AM2S-SR combines the intelligence of ML with the practicality of a breathable, biocompatible, and extremely flexible magnetic skin.The device system offers a new approach for tracking the movement of the mouth.Combined with ML, it allows the detection and prediction of the pronounced letters, allowing SSD patients to have a natural conversation with the public verbally and visually.

Concept
The AM2S-SR is a comprehensive solution that allows SSD patients to talk using the mouth and have a more natural conversation with others verbally and with ease.The AM2S-SR consists of 1) magnetic skins (i.e., a biocompatible, highly flexible, and stretchable magnetic composite) attached to different locations near the lips, 2) two Magnetphones (i.e., magnetic field trackers) with integrated magnetic field sensors position in front of the lips for tracking the variation of the magnetic field, 3) a head-unit with a built-in microcontroller, 4) a ML algorithm for letter reconstruction, and 5) a display or speaker for verbal and visual interaction.
Figure 1A shows an illustration of a mute person using the AM2S-SR for natural interaction communicating with the public (e.g., public, other mute or SSD-afflicted person).The AM2S-SR is envisioned to allow the user, by talking using their mouth (even if they are mute), to have a conversation for communicating with others (e.g., public, other mute or SSD-afflicted person) more naturally and with ease. Figure 1.AM2S-SR for SSD-afflicted people.It tracks the movement of the mouth and translates it into words and letters, allowing SSD patients to communicate with others more easily.A) On the left, a mute person is communicating through sign language which is not understandable by the public.On the right, a patient employing AM2S-SR to have a natural communication.B) The AM2S-SR system consists of 1) magnetic skin attached near the lower lip of the user; 2) Magnetphone device with integrated magnetic field sensors, to track the magnetic skin; therefore, the movements of the mouth while talking; 3) a display for visual communication and a speaker for audio communication.
The operation of the AM2S-SR is as follows: as the user talks the lips move and the attached magnetic skin moves as well.As a consequence, the magnetic field surrounding the mouth changes.The magnetic field sensors in the Magnetphones measure these changes and the data are streamed into the head-unit for further processing.After that, the processed data are analyzed using the ML algorithm to predict the intended words and letters.Finally, the predicted words are exported visually on a display, or verbally using speakers.

The Magnetic Skin
The correct placement of the magnetic skin is in the line where the inferior cutaneous lip and the lower lip vermillion meet, as shown in Figure 1B.The purpose of this location is to capture the movement of the lips and translate them into unique variations in the magnetic field.Examples of these movements include opening the mouth (e.g., by moving the cheek downward), making an O-shape, stretching the lips to the sides (e.g., smiling), combined motions (e.g., stretching the lips while the mouth is open), and so on.Note that the words and letters that we speak are generated by these movements (various combinations and degrees of movements as a function of time).Furthermore, the major division in speech detection is between vowels and consonants.The articulations and structures of the vocal tract involved in verbal speech are detailed in S1, Supporting Information.
Figure 2A shows the workflow chart of the AM2S-SR.Figure 2B shows the variation in the magnetic field as a function of time when there is a combined movement produced by pronouncing the letters A, O, and N. Note that each letter has different magnetic field structure (e.g., different shape and period) for each axis of the magnetic field sensors.In other words, each expression has a unique magnetic field signature and by tracking these variations the AM2S-SR ML algorithm can interpret the intended letters, as discussed later.
The location of the magnetic skin should be chosen in a place where the more possible letters can be tracked while talking, even if the patient has injured vocal cords, and try to avoid the detection of ordinary movements or activities such as laughing, swallowing, or sneezing; also, it is desirable that the location does not bother or discomfort the user.Therefore, it is placed in the lower lips to detect the more possible articulations of the mouth while talking and providing minimal invasion.Such configuration allows the patient to use other accessories such as eyeglasses, braces, piercings, or collars.Different samples of magnetic skins have been placed and tested on six different people (i.e., three males, three female individuals), with an error of location of 5 mm from test to test due to differences in lip anatomy, self-placement action, and variation of schedule (i.e., tests are done on different days, locations, and times).Furthermore, the users were able to perform conversations without being disturbed by the magnetic skin.Movie S1, Supporting Information, shows a demonstration of the users with the configuration proposed.

The Magnetphone
The Magnetphone consists of a 3-axis magnetic field sensor to track the movement of the lips in three dimensions (i.e., coronal, sagittal, and transverse planes).Two Magnetphones are placed to the sides of the lips (i.e., one at each side of the face), each of which consist of a 3-axis magnetic field sensor to track 3D movement of the lips (i.e., coronal, sagittal, and transverse planes).This two-Magnetphones configuration provides a better signal-to-noise ratio and allows the detection of asymmetric lips movement; to exemplify, patients with apraxia have the language capacity to talk, but the brain signals send to the persons' mouth are not sent correctly; therefore, the coordination of the mouth movements is not properly and the production of coherent sounds and sentences does not enable proper communication. [3]igure 2C shows a photograph of the Magnetphones placed about 2 cm away from the face.Note that the Magnetphone still works when placed up to 3.5 cm away from the face.However, the strength of magnetic field drops as the distance increases.The complete design of the Magnetphone is shown in S3, Supporting Information.

The Head-Unit
The head-unit acts as the central hub for the AM2S-SR that receives the data from the Magnetphones and combines them into an array of vectors.In other words, the input to the headunit is six data streams (3-axis magnetic field right þ 3-axis magnetic field left = 6 data streams) and the output is an array of vectors representing the measured raw data.Each vector stores the raw data of [x,y,z] right and [x,y,z] left .
The head-unit consists of a microcontroller and is interfaced with the Magnetphone using inter-integrated circuit (I2C) protocol.The complete design of the head-unit is discussed in S4, Supporting Information.In this work, the ML learning algorithm and exporting the data visually and verbally are implemented in a separate computer; however, these features will eventually be integrated into the head-unit.

Data Processing and ML Classification
ML and data processing are achieved by cleaning the raw data, preparing the training dataset, training and testing the model, and using the exported model AM2S-SR for letter reconstruction, as shown in Figure 3A.
Cleaning the data is achieved by passing the raw data array into a median average function (window size = 50 points, S5, Supporting Information) to reduce the noise and outliers.Preparing the training dataset is achieved by grouping the collected samples of all the English alphabet, each of these groups is considered as a predictor to train the ML model later.In other words, the training dataset consists of 26 predictors to train the ML model on identifying all the English letters.In average, each predictor has about 45 samples; thus, about 1560 samples have been used to train the model.On top of that, 15 samples per predictor (25% of the total data) are used to test the model.
A nonparametric supervised ML Fine K-Nearest Neighbor (Fine KNN) classifier is used for letter reconstruction.In short, KNN works by calculating the distances between an observation (i.e., array of vectors represents a specific letter) and the training dataset.According to the majority vote and the Nearest Neighbor classification rule, the observation is assigned to the most appropriate class (i.e., the appropriate letter). [24]Following is a KNN classification example of letter "B" as a new observation imported to the classifier.First, the k nearest points (i.e., number of nearest neighbors included in the majority vote) to the observation are examined.In case the majority of the k points belong to G B (the class of letter B), assign the observation to G B .Otherwise, assign the observation to, for example, G E (the class of letter E which also has a short distance to the observation).S6, Supporting Information, discusses the KNN algorithm in more details.Once the model has a respected accuracy, the signal can be displayed visually on a display, or verbally using speakers.B) The Magnetphone device detects mouth movements while talking.When there is no mouth movement, there is no change in the magnetic field signal.Once the lips move, the magnetic skins move accordingly.It can be seen as the change in the magnetic field with time.C) The AM2S-SR hardware consists of a head-unit with integrated microcontroller, multiplexer and a Bluetooth module, two Magnetphones with built-in 3-axis magnetic field sensors.D) User with magnetic skin tattoos in the lower lip to allow detection from the Magnetphones of the AM2S-SR.

Measurement Results and Discussion
The results show 98.2% validation accuracy, implying that the model is suitable for AM2S-SR letter detection.Furthermore, to evaluate the performance of the Fine KNN for AM2S-SR letter detection, a test confusion matrix using the test dataset is implemented.Figure 3C shows the test results highlighting the true positive rates (TPR) and the false negative rates (FNR).The accuracy of the classifier ranges between 87.8% and 99.7%, with an average accuracy of 94.96%.Table 1 highlights the performance of each letter and sorts them in a descending order (from highest to lowest).The letters with highest TPR are N, B, and Z.On the other hand, the letters with the lowest TPR are A, Q, and K in which they are confused with M, U, and J, respectively.Table 2 contrasts the similarities between those samples highlighting the misclassification between the true class and the predicted class.The versatility of the magnetic skin and the potential of AM2S-SR allow the user to accommodate more pieces of magnetic tattoos if the area of detection needs to be increased, also the implementation of a larger number of Magnetphones if the patient circumstances require a different arrangement of detection of the mouth movements to characterize each letter (i.e., symmetric or asymmetric motion of the lips).

Conclusion
In summary, the AM2S-SR utilizes magnetic skins and a KNN ML classifier model to allow mute people to communicate verbally and visually with the public.The system works by attaching magnetic skin patches to the lower lip of the patient for accurate tracking of the lips movement.In combination with magnetic field sensors and Fine KNN ML classifier, AM2S-SR successfully identified the all the English alphabets with average success rate of 94.96%.Next, AM2S-SR needs to be trained to predict a wide range of words including standard sign language, most popular phrases, and simple sentences.Furthermore, the system should be trained to accommodate aphonic individuals that might have a combination of other speech sound diseases such as apraxia and asymmetric mouth movement coordination.

Experimental Section
Magnetphone: The Magnetphone was designed using a magnetic field sensor (BM1422AGMV) soldered on a 6 Â 1.4 cm 2 printed circuit boards (PCB).The Magnetphone was then fixed into a frame that held the Magnetphone about 1 cm in front of the lower lip.
Head-Unit Design: The head-unit was battery powered with a 5 V battery and used to read and process the data from the magnetic sensors.The head-unit PCB was realized using a microcontroller (Bluno nano, DFRobot).The connections between the head-unit and the Magnetphone were done through 5-pin flat flexible cables (FFC) ribbon connectors using an I2C communication protocol with 4.7 kΩ pull-up resistors.
Data Collection and Processing: The data were collected by recording in Arduino, with the magnetic field sensors, the changes in the magnetic field from the magnetic skins attached to the lips while we talk.The user was asked to pronounce 20 times each letter with a delay of 1 s between each sample and trying to avoid abrupt movements such as coughing or laughing while talking.This process was replicated in three different scenarios (i.e., inside the laboratory, in a conference room, and in an office), in three different directions (i.e., x, y, and z) to have a diversified environment, in different days, at different times (i.e., morning, night, and midday), under different conditions (i.e., temperature, light, electronic signals coming from computers, laboratory equipment, and so on) and with different sets of magnetic skin patches (i.e., six pairs of different magnetic skin patches).The collection of data was done where no action such as walking, eating, or twisting the body was involved in the process; however, involuntary natural movements (i.e., breathing, scratching between samples) did not interfere with this process.The samples were saved in a.csv file and imported to MATLAB for data processing.After data cleaning and smoothing, the training dataset with its correspondent predictors was imported to the Classifier Learner.Note, this project was approved and regulated by the Institutional Biosafety and Bioethics Committee (IBEC) of King Abdullah University of Science and Technology (IBEC number 18IBEC19) and supported by informed written consents by all participants.
KNN Algorithm Supervised Learning Classifier: The classification model was implemented using the MATLAB Classification Learner app, where the data were processed, trained, and assessed on diverse models with different validation configurations and schemes, such as cross-validation or holdout validation.After the model performance was evaluated, it was exported to MATLAB for further analysis.
Breathable Magnetic Skin: As shown in Figure S2, Supporting Information, the magnetic skin was prepared by mixing EcoFlex 00-50 (Smooth-on) with micromagnetic powder (MQP-16-7FP NdFeB).The mixing ratio was 1:1 wt% (i.e., 25 wt% EcoFlex part A, 25 wt% EcoFlex part B, and 50 wt% NdFeB).Then, the mixture was vacuum desiccated for about 10 min to remove air bubbles.After that, the mixture was molded into the desired shape and thickness, planarized using a casting knife, and kept drying at room temperature for 3 h.
Magnetic Skin Dimensions and Attachments to the User: Tracking the movement of the lips was achieved using 10 Â 2 Â 0.5 mm 3 magnetic skin patches that were magnetized along the thickness.The patches were attached to the skin of the user using a Vaseline (a biocompatible and clinically proven petroleum jelly).Shows the success probability of predicting the correct class when a new observation from the test dataset is given to the KNN classifier model.Shows the grammatical similarities while pronouncing the letter A, J, and C (i.e., true class) and the M, K, and Q (i.e., predicted class).This is translated into a similar movement in the anatomical planes (i.e., coronal, sagittal, and transverse planes) where we visualize the movements of the mouth while we talk.

Figure
Figure3Bshows the validation confusion matrix of the implemented Fine KNN model achieved using the training dataset.

Figure 2 .
Figure 2. Implementation of the AM2S-SR.A) Flow chart of the AM2S-SR.Mouth movements are translated into changes in the magnetic field.These changes are detected by the magnetic fields sensors within the Magnetphone.The microcontroller translates these signals into vectors.The ML engine creates a model that classifies these vectors to the class with the highest probability.Once the model has a respected accuracy, the signal can be displayed visually on a display, or verbally using speakers.B) The Magnetphone device detects mouth movements while talking.When there is no mouth movement, there is no change in the magnetic field signal.Once the lips move, the magnetic skins move accordingly.It can be seen as the change in the magnetic field with time.C) The AM2S-SR hardware consists of a head-unit with integrated microcontroller, multiplexer and a Bluetooth module, two Magnetphones with built-in 3-axis magnetic field sensors.D) User with magnetic skin tattoos in the lower lip to allow detection from the Magnetphones of the AM2S-SR.

Figure 3 .
Figure 3. KNN classifier model.A) Workflow for the classifier model for AM2S-SR.The raw data obtained from the Magnetphone and recorded in the head-unit are processed and prepared for the training dataset.Then, the 26 predictors (i.e., from A to Z) are selected as well as the Validation Scheme (i.e., Hold-out Validation Scheme).The KNN classifier model is trained and tested with the training dataset, and test dataset, accordingly.The model is exported for further analysis and testing.B) Validation confusion matrix for TPR and FNR.C) Test confusion matrix for TPR and FNR.In orange cells, the true class and the predicted class do not match.

Table 1 .
Prediction of classes.

Table 2 .
Grammatical similarities between true class and predicted class.