Intelligent Systems for Muscle Tracking: A Review on Sensor-Algorithm Synergy

establish the three primary sensor modalities used to obtain information on muscle activity: EMG, near-infrared spectroscopy (NIRS), and ultrasound. The second will establish the three classes of algorithms often used in conjunction with sensor modalities to obtain information about muscle activity: time – frequency domain, machine learning, and deep learning

algorithms. Finally, the Applications Section will summarize the current state of technologies used to track muscles.

Physiology Background
Many conditions that necessitate neuromuscular rehabilitative therapies, like Parkinson's disease-a neurological disorder that causes neuromuscular dysfunctions-require extensive access to healthcare facilities and personnel support, and so would significantly benefit from noninvasive, wearable devices that continuously monitor muscle activity to enhance therapeutic precision and fidelity as well as increase the independence of the individual. [2,3,5] To reveal the practical impact of muscle tracking in the treatment of neuromuscular diseases, we will first elaborate on the physiology behind muscle control, depicted in Figure 1, and then establish the prevalence of conditions that affect neuromuscular control of skeletal muscle to highlight the impact of muscle tracking.

Nervous System
Within the context of neuromuscular control, the two key aspects within the nervous system are the primary motor cortex and motor neurons. The motor cortex-which receives input from a number of areas, including the basal ganglia and cerebellum -is located on the medial-dorsal portion of the brain and is responsible for the voluntary control of the skeletal muscle. [6] The transmission (TX) of a signal from the motor cortex to the muscle relies on the propagation of action potentials down a motor neuron, which further relays the signal to the relevant muscle. Because motor neurons can project across long distances, sometimes exceeding 1 m in length, they are myelinated to increase action potential propagation speed. [7] Glial cells are also a key element responsible for increasing the reliability and speed of motor neuron signals throughout the body. [5] Once motor neurons reach the target muscle, they form a neuromuscular junction (NMJ).

Neuromuscular Junction
NMJs are the contact points between motor neuron terminals and myofibrils, allowing for the control and innervation of collections of distinct muscle fibers, or motor units (MUs)-the smallest functional unit of muscle. [8] A typical NMJ consists of three main parts: i) a presynaptic part (motor neuron terminal), ii) a postsynaptic part (the motor end plate), and iii) the area between the two (synaptic cleft).

Motor Neuron Terminal
Once it reaches the target muscle fiber, the motor neuron loses its myelin sheath and branches into 100-200 nerve endings, called terminal buttons. The membrane of each terminal button www.advancedsciencenews.com www.advintellsyst.com has a number of voltage-gated ion channels, including potassium and calcium channels. In addition, each terminal button also contains synaptic vesicles (SVs)-containers used to store neurotransmitters, in this case acetylcholine (ACh), the neurotransmitter responsible for communication across the synaptic cleft. When an action potential arrives at the terminal button, Ca 2þ channels open, allowing a rapid influx of calcium into the terminal. The accumulation of Ca 2þ causes a series of events allowing for the exocytosis of ACh from the SV into the synaptic cleft. [9][10][11]

Synaptic Cleft
The space between the motor neuron terminal (terminal button) and the motor end plate is called the synaptic cleft and measures around 50 nm. When the terminal button releases ACh into the synaptic cleft, it traverses the distance to bind to the nicotinic ACh receptors (AChRs) on the motor end plate, thereby inducing a muscle action potential. To prevent prolonged stimulation of the motor end plate, the synaptic cleft of NMJs contains acetylcholinesterase-an enzyme responsible for the catabolism of ACh. [8,11]

Motor End Plate
The postsynaptic portion of the NMJ is called the motor end plate. Here, the muscle plasma membrane-the sarcolemma-is folded to form depressions called junctional folds where nicotinic AChRs concentrate upon. Binding to these AChRs triggers the opening of sodium channels, allowing the influx of sodium ions from the extracellular fluid into the muscle membrane. The influx of positive ions into the muscle in turn generates and transmits action potentials through the muscle fiber. [8][9][10] In addition to the structural complexity of NMJs, a number of chemical mediators-like MuSK and agrin-are also involved in their formation. While their mechanism of action is out of the scope of this review, it is important to note both contribute heavily to the formation of synapses and the aggregation of AChRs at the NMJ. [12,13] Because of the structural and chemical complexity of NMJ formation, there are a number of points of failure in this process that can lead to neuromuscular dysfunction, thus necessitating an understanding of the pathways involved. Figure 2 depicts the structural makeup of skeletal muscle. Generally, a "single" muscle is made up of a set of muscle-fiber bundles-also known as muscle fascicles-each of which is made up of a number of muscle fibers. Each individual muscle fiber is surrounded by a membrane called the sarcolemma, within which exists the sarcoplasmic reticulum-a network of specialized smooth endoplasmic reticulum designed to efficiently transmit electrical impulses and store calcium ions-and individual myofibrils. A single motor neuron interfaces with muscle fibers specifically, allowing for exceptionally fine control of muscles in certain areas that cooperatively orchestrate the diverse cohorts of body movements [15,16] (Figure 3).

Skeletal Muscle
A single muscle fiber can be between 20 and 100 μm in diameter and many centimeters long-some are as long as 12 cm. [14] A single muscle fascicle may contain anywhere from 20 to 60 muscle fibers, surrounded by a connective sheath called the perimysium. Multiple motor neurons can innervate muscle fibers within each muscle fascicle, and a single MU may span multiple muscle fascicles. As such, a single MU can extend as much as 6 mm in diameter. [15] Each muscle fiber contains numerous myofilaments, of which there are two main types: actin (thin) and myosin (thick). Their roles must be understood in the context of activation. When ACh binds to AChRs, a number of sodium and calcium ion channels open in the sarcolemma, allowing for an influx of positive ions. This creates an action potential that can propagate down the myofibril due in part to the presence of voltage-gated ion channels across the sarcolemma. [16] Figure 2. Reproduced with permission. [185] Copyright 2013, OpenStax; Hierarchy of skeletal muscle organization from the largest (muscle) to smallest (myofiilament) structural unit.
Increasing local concentrations of calcium (Ca 2þ ) beyond a threshold leads to the binding of actin filaments, triggering a conformational change that in turn makes actin bind to myosin. It should be noted that myosin is at baseline bound to adenosine diphosphate (ADP) and an inorganic phosphate group (PO4 i ), together forming adenosine triphosphate (ATP). When the actin binds to myosin, ADP and PO4 i are released, allowing the actin filament to slide along the myosin filament, thus producing a muscle contraction. After contraction, ATP binds again to set the myosin filament to its start position, thus storing potential energy for the next contraction. [17] This process, occurring on larger, synchronized, scales is what allows for the coordinated control of skeletal muscles. It should be noted that the time between stimulus to the motor neuron and the contraction of the innervated muscle-the latent periodtakes around 10 ms due to the time taken for the propagation of nerve action potentials, chemical TX across the NMJ, and then the excitation-contraction coupling in muscles. [18]

Clinical Relevance
The goal of the following section is to establish the prevalence of conditions that necessitate wearable, noninvasive, real-time muscle tracking devices: namely, conditions that can cause muscle weakness and tremors. A majority of these conditions can be divided into five categories on the basis of their cause: neuromuscular, endocrine, nutrition, medication, and toxin. It is also important to note that there are a number of acute conditions leading to muscle weakness, tremors, paralysis, or amputation. While the focus of the following section is on the chronic sources of muscle weakness and tremors, many of the applications discussed will be applicable to the symptoms of acute origin, like strokes and physical trauma. Table 1 also summarizes the results of our review.
Muscle weakness and tremors are both symptoms that necessitate extensive neuromuscular rehabilitation, often requiring extensive resources. In addition, many of these resources-such as rehabilitative equipment and physiotherapists-are only intermittently accessible to patients. It is for this reason wearable muscle tracking devices are necessary; they can provide the rehabilitative assistance needed by patients with muscle weakness and tremors to accomplish daily life activities by actively tracking and assisting muscle activity. To accomplish this, the muscle tracker must be wearable and noninvasive-to ensure comfort and practicality of use-and be able to analyze muscle activity in real-time to quickly provide assistance for the user.
A number of wearable sensors have been proposed for rehabilitative purposes in the past. Carpinella et al., [19] for example, made use of wearable sensors to provide haptic and visual biofeedback for balance and gait training in individuals with Parkinson's disease-discussed in the following section. In addition, in 2018, Porciuncula et al. [20] published a review of the wearable movement sensors currently used for rehabilitation, demonstrating their use for rehabilitation from a variety of conditions, as well as their integration with robotics.

Neuromuscular
Neuromuscular conditions include a wide range of diseases that impede neuronal and muscular processes. Parkinson's disease, for example, affects 20 per 100 000 people. It is characterized by bradykinesia, muscular rigidity, and resting tremor, as well as hyposmia, sleep disorders, and depression. Parkinson's disease is caused by progressive degeneration of dopaminergic neurons, resulting in loss of dopaminergic function and consequent diminished motor function. Glutamatergic, cholinergic, serotonergic, and adrenergic systems are also involved in causing nonmotor features of the disease. [21] Multiple sclerosis has a prevalence of 35.9 per 100 000 people. [22] Patients with the disease experience muscle atrophy, joint stiffness, and shortness of breath. These symptoms arise because the immune system attacks the myelin sheaths of motor neurons, in particular, causing nerve damage that disrupts communication from the motor cortex to the relevant skeletal muscle. The events lead to progressive degeneration of upper and lower motor neurons. [23] Myasthenia gravis (MG) has a prevalence of 7-16.3 per 10 000 people [24] and is characterized by severe muscle weakness, shortness of breath, impaired speech, and blurred or double vision. [25] In patients with MG, some antibodies produced by the immune system are directed against an individual's own proteins. These antibodies attack molecules like nicotinic AChRs, MuSK, and low-density lipoprotein receptor proteins, leading to reduced density and functionality of the AChRs and hence reduced neuromuscular TX. [25] Lastly, amyotrophic lateral sclerosis (ALS) is a deadly disease with a prevalence of .32 per 100 000 people, with only 10%-20% of ALS patients surviving beyond 10 years after onset. [26] Patients with ALS experience fasciculations, muscle weakness, cramping, and slurred speech. The most common cause of ALS is a mutation of the gene encoding the antioxidant enzyme superoxide dismutase 1 (SOD1). This mutation causes a misfold in the enzyme, leading to its aggregation in the motor neurons within the central nervous system (CNS), thereby disrupting communication between the brain and body. [27]

Endocrine
Various dysfunctions in the endocrine system can have neuromuscular effects. Cushing's syndrome (CS) affects roughly 2-5 people per million. [28] Those with CS overproduce the hormone cortisol, which can be a result of excessive production of the hormone, adrenocorticotropic hormone (ACTH), from the Table 1. Lists the prevalence, causes, and symptoms of common conditions of clinical relevance to the topic of muscle control and sensors. All conditions are separated by class: neuromuscular (NM), endocrine (EC), vitamin deficiency (VD), toxin (TN), and medication (MD).
Muscle weakness, shortness of breath 7-16.3 per 100 000 Amyotrophic lateral sclerosis [26] Congenital (SOD1 mutation) Muscle weakness, myoclonus 0.32 per 100 000 EC Cushing's syndrome [28,29] ACTH overproduction Muscle weakness 0.2-0.5 per 100 000 Grave's disease [30][31][32] Thyrotoxicosis Tremors, myopathy 24.8-276 per 100 000 Diabetes insipidus [33] ADH deficiency Dehydration, muscle weakness 4 per 100 000 VD Vitamin B12 deficiency [36][37][38] Crohn's and Celiac disease; diet Tremors, ataxia N/A Vitamin B5 deficiency [39][40][41] PKAN; Huntington's disease; diet Tremors, abnormal gait, paresthesia N/A Magnesium deficiency [42] [44] Side effect Tremors 2%-4% of users Selective serotonin reuptake inhibitors (SSRIs) [44] Side effect Tremors 20% of users Dopamine blocking agents [45] Side effect; disruption of nigrostriatal dopaminergic neurotransmission Tremors N/A Immunosuppressants (cyclosporine, tacrolimus) [45] Side effect; affect dopaminergic transmission Tremors N/A anterior pituitary gland. CS can also occur due to diseases in the adrenal gland. The overproduction of cortisol can manifest symptoms such as muscle weakness. [29] Grave's disease is an endocrine disorder that can lead to extreme thyrotoxicosis, which can in turn trigger an event that causes increased availability of thyroid hormones. The symptoms that manifest from this disease and excess thyroid hormone include tremors, choreoathetosis, neuropsychiatric impairment, and myopathy. [30] Research has conflicting reports on the prevalence rate of Grave's disease. A study in Korea found the prevalence of Grave's disease to be 2.76 per 1,000 people. [31] In contrast, a study in the United Kingdom found a prevalence rate of early onset Grave's disease in adults to be 24.8 per 100 000. [32] Diabetes insipidus (DI) has a prevalence rate of 1 per 25 000 people. DI occurs in response to a reduction in antidiuretic hormone (ADH), an important hormone used to reabsorb water back into the kidneys from the blood. DI can therefore result in excessive urination and dehydration, thereby potentially causing muscle weakness and tremors. Wolfram is a disorder that can lead to central DI and type 1 diabetes mellitus in children. [33] Wolfram syndrome has a prevalence rate of 1 in 770 000. Symptoms of wolfram syndrome include myoclonic tremor, cerebellar ataxia, and dysarthria. [34] Wolfram syndrome occurs due to mutations that cause pancreatic beta-cells to undergo apoptosis. In addition, wolfram syndrome causes demyelination as well as degradation of the nervous system. [35]

Nutritional Deficiencies
Vitamin deficiencies are oftentimes due to underlying issues related to the inability to absorb nutrients. For example, common causes of vitamin B12 deficiency include a number of gastrointestinal diseases, including Crohn's and celiac disease. Vitamin B12 is important for nervous system maintenance, [36] with its absence causing demyelination in both the peripheral and CNSs. [37] Vitamin B12 can be associated with symptoms such as tremor, ataxia, and fatigue in young children. [38] Vitamin B5, or pantothenic acid, is important for the synthesis of coenzyme A (CoA). CoA's role in the body includes anabolic and catabolic reactions that synthesize organic molecules, such as fatty acids, cholesterol, and ACh. Vitamin B5 deficiency can affect the nervous system, causing symptoms such as muscle cramping, paresthesia, irritability, and faulty coordination that is associated with tremors. [39] Vitamin B5 deficiency can occur due to pantothenate kinase-associated neurodegeneration (PKAN), an autosomal recessive disorder resulting in an inability to produce CoA from vitamin B5. [40] Vitamin B5 deficiency has also been associated with Huntington's disease, an autosomal dominant disorder, due to its impact on neurodegeneration and dementia, which are the main contributors to the disease. [41] Magnesium deficiencies are often caused by underlying issues such as type 2 diabetes and Crohn's disease. Magnesium deficiencies are associated with symptoms such as tremors, fasciculations, and spasms, as it plays an important role in a number of enzyme-mediated reactions, including glutathione synthesis. Low magnesium levels oftentimes are associated with low levels of calcium and potassium as well, which in turn can also cause tremors and spasms. [42]

Toxins
Heavy metal exposure, acute or chronic, often leads to muscle tremors and permanent neurological damage. Acute exposure to heavy metals often induces encephalopathy. Symptoms include poor memory, headaches, and postural tremors. Longterm exposure to heavy metals will often cause harm to peripheral nerves. This harm leads to permanent weakness and muscle weakness in the extremities. Some metals, such as manganese, have more specific and intense effects on the human body. Manganese, when exposed in excess quantities such as with poorly ventilated welding, can damage the basal ganglia, a portion of the brain responsible in part for motor control, leading to symptoms that closely align with Parkinson's disease. Carbon disulfide has also been found to cause damage to the basal ganglia. Toluene, a compound found in paint, nail polish, and lacquers, can over prolonged exposure cause peripheral nerve damage and symptoms similar to heavy metal exposure. [43]

Medications
Muscle tremor is a common side effect of certain medications. While the exact physiological mechanism is unknown, beta-adrenergic agonists cause tremors in 2%-4% of users. [44] In addition, some antidepressants may cause tremors; selective serotonin reuptake inhibitors (SSRIs) lead to some form of muscle tremor in 20% of users, though the mechanism is still unknown. Dopamine-blocking agents can also cause muscle tremors by disrupting nigrostriatal dopaminergic neurotransmission. Immunosuppressants, such as cyclosporine and tacrolimus, cause tremors by affecting dopaminergic neurotransmission. [45]

Sensor Modalities
The following section is dedicated to explaining the current devices used to image and track muscle activity. More specifically, we will focus on sensor modalities used to elucidate internal muscle activity, not solely kinematic information. Generally, these devices can be characterized by their output: structural or functional data. For the purposes of this review, structural data and functional data refer to outputs that describe the physical conformation and the activity of skeletal muscle, respectively. For outputs that accomplish both tasks simultaneously, we will refer to them as structural-functional data. The following sections will detail the mechanisms of three sensor modalities: EMG, NIRS, and ultrasound. Each of these modalities can be characterized by their ability to produce structural or functional data, though some can accomplish both with varying degrees of success. Table 2 summarizes temporal and spatial resolution, as well as the sensing depth for each sensor modality.

Electromyography
The general premise of EMG is to measure the electrical activity in a given muscle over a period of time, and correlate changes in voltage to changes in muscle activity, be it concentric (muscle shortening), eccentric (muscle elongation), or isometric (constant muscle length) contractions. [46] In particular, EMGs can easily capture events of isometric contractions due to the nature of the captured signals being related to electrical activity instead of physical change of muscle length. In contrast, a number of other structural imaging modalities, including NIRS and ultrasound, find challenges in resolving events related to isometric contractions of muscles as they primarily capture changes in muscle length instead of electrical activity. Thus, the highresolution temporal data (generally between 800 and 1000 Hz [47] ) of muscle activity, obtained by EMGs, highlights the excellent capabilities in functional imaging. However, the spatial resolution of EMGs is heavily limited not only by the number, size, and type of recording electrodes but also the electrical dispersion inside tissue, making it challenging for imaging with high spatial resolution. [48] 4.1.1. Physiology EMG signals are a measurement of the electrical activity of muscle fibers active at a given point in time. More specifically, this electrical activity is a function of the depolarization of muscle fibers in response to stimulation by motor neurons. When a motor neuron stimulates a muscle fiber to contract, a number of ion channels open allowing for the flow of charged particles-ions. The rate of flow of charge (Q ), also known as current (I), is directly related to the electrical potential across the muscle membrane. [49] When comparing the electric potential between two points, we can in turn obtain the voltage.
The summative changes in voltage as a result of large-scale muscle activation are captured by EMGs and interpreted to provide information about the muscle contraction itself. [50] For more information on the mathematical theory behind EMG signals, see the Supporting Information section.
Generally, voltage from muscle activity is measured through either surface electrodes or needle electrodes. The former is used to obtain surface EMGs (sEMG), while the latter is used to obtain intramuscular EMGs (imEMG). The key difference between the two comes from the source of the signal. Because EMG signal sources are located in the depolarized zones of muscle fibers, an sEMG collects the signals that have been attenuated through skin, vasculature, and fat lying between the electrode and the muscle, while an imEMG directly probes sources of the signal with little attenuation. Because biological tissue absorbs higher frequency signals more easily, it can serve as a spatial low-pass filter [51] thus deforming the signals transmitted through. It is partly for this reason intramuscular recordings can be more reliable, as their closer proximity to the signal source in comparison to sEMGs can mitigate the impact of surrounding tissue on signal quality.

Instrumentation
As mentioned earlier, there are two main configurations of EMGs: sEMG and imEMG. The former relies on placing an electrode on the surface of the skin, above the target muscle, while the latter relies on placing a needle inside the target muscle. When considering the impact these two configurations have on signal quality, it is important to understand how tissue is analogous to circuitry.
When sEMG is employed, there are four additional components of tissue that interfere with extracting signals from the muscle: skin (stratum corneum, epidermis, and dermis), vasculature, sweat glands, and subcutaneous fat. In addition, a gel must be placed between the electrode and skin to allow the electrode to interface with the skin without air gaps, adding another layer of resistance. The layers of tissue can be considered analogous to a set of resistor/ capacitor circuits, as shown in Figure 4a, thus resulting in a significant reduction in signal intensity from the biopotential source to the electrode. In contrast, imEMGs use needles to bypass the tissue, thus minimizing the reduction in the intensity of signal.
Electrode Design: Generally, surface electrodes are made of silver/silver chloride (Ag/AgCl), silver (Ag), silver chloride (AgCl), or gold (Au). However, Ag/AgCl electrodes are preferable because they are nonpolarizable, thus making the electrode-skin impedance a function of resistance, not capacitance. This in turn makes the surface electrode less sensitive to the movement of the electrode relative to the skin and increases the signal-to-noise ratio (SNR). [52] Electrodes can also vary in size, with smaller electrodes gathering lower intensity signals but also allowing for finer spatial resolution of muscle activity. Generally, electrodes Table 2. A table of the temporal and spatial resolutions, as well as penetration depth, for each sensor modality.

Sensor modality
Temporal resolution Spatial resolution Penetration depth sEMG 800-1000 Hz, though this is subject to variation depending on the muscle target, number of electrodes, and orientation [47] Variable, but low; in clinical settings, generally only capable of distinguishing the activity of distinct portions of muscles [48] Variable, but low; attenuation of electric signal depends on the tissue lying above, but in general, deeper tissues are harder to detect and separate [48] NIR Spectroscopy Between 5 and 100 Hz, depending on the type of functional imaging, for wearable NIR spectrometers [64] Variable, as low as millimeter-level resolution; In our search, we found only a few wearable devices for skeletal muscle imaging, some of which can attain millimeter-level resolution (Hamaoka et al. [63] ) Variable, around 2 cm; dependent on intensity of input light, as attenuation of signal is percentage-based [65] Ultrasound Between 25 and 204 Hz for M-Mode ultrasounds imaging skeletal muscle, though this is also dependent upon the scanning area and number of transducers [78] Axial resolution: 0.25-4.0 mm 3-17 cm, depending on transducer probe type and frequency [79] Lateral resolution: 0.5-5 mm Both axial and lateral resolution are dependent on depth and transducer probe type [78] are around 10 mm in diameter to maximize spatial resolution and minimize crosstalk from surrounding muscles. [53] Surface-recording EMGs may lose many fine details of signal feature due to the low-pass filtering that occurs when conducting across the skin. Intramuscular electrodes overcome this issue by inserting into the contracting muscle and recording individual MU action potentials. Depending on the type of electrode used and its location, the recorded action potentials can be the result of the activity of a small (1-3), moderate (15)(16)(17)(18)(19)(20), or large (more than 20) number of muscle fibers. [54] Two commonly used needle recording electrodes are monopolar and concentric. The monopolar needle is a solid, stainless steel shaft coated completely with Teflon, except for the bare metal tip, which acts as the recording surface. The concentric needle electrode is a hollow, stainless steel hypodermic needle with a platinum or Nichrome silver wire, surrounded by an epoxy resin that insulates the material. The primary difference between the two is the recording volume-what portion of a motor-unit territory it covers. Generally, concentric needles are of higher resolution due to their smaller diameter. However, the process of insertion is invasive and difficult to position the needle to the desired MU, limiting the broad applicability of imEMG despite its higher spatial resolution. [55] As such, for the purposes of this review, we will focus on sEMG applications as they are more easily implemented in wearable devices.
Electrode Configuration: Generally, there are two ways to configure electrodes, be they surface or needle: monopolar and bipolar. [56] Monopolar EMGs record the electrical potential from one point along the muscle tissue and from a reference electrode. The difference between these two points yields the measured voltage. This configuration, while attaining a higher detection volume-the amount of signal actually detected-is more susceptible to interference and noise. Bipolar EMGs record the difference between two monopolar EMGs. Because many sources of noise will affect both recording electrodes similarly, such noise can be attenuated by subtracting the two monopolar EMGs (creating a differentiated signal). Though this results in a recording less sensitive to noise, it also reduces detection volume.

Analog Signal Processing
In the case of a bipolar EMG setup, as described in Figure 4b, differential amplifiers can be used to multiply the difference between the two voltage signals by a constant value-gain. In the case of sEMG, amplifiers should have a high input impedance to minimize power line interference from unbalanced impedance in the electrode-skin interface. They should also have a high common-mode rejection ratio-a metric used to measure a device's ability to reject common-mode signals (those that appear simultaneously and in-phase on both input electrodes)-to ensure cancellation of common-mode voltages between surface electrodes. After the amplification stage, bandpass filters are used to filter the signal prior to digital processing. More specifically, low-frequency cutoffs (5-20 Hz) [57] remove the baseline www.advancedsciencenews.com www.advintellsyst.com drift of signal associated with movement and perspiration, while the high-frequency cutoff (200-1000 Hz) removes high-frequency noise and aliasing. Finally, the analog signal is converted to a digital and processed through software. Of important note is the sampling rate required to properly convert analog signals to digital without aliasing.

Digital Preprocessing
Generally, the SNR for EMGs is affected by electrical noise, motion artifacts, and environmental conditions, like humidity and conductivity of the skin. Approaches to improve the SNR range from device engineering and data processing. The former prefers a dry, hairless, and conformal interface between the skin and the electrode to minimize additional resistance and air gaps, as well as prevent the gradual slipping of the electrode. Still, advanced software tools are often required to further process EMG signals.
One commonly used method is to first rectify the input signal and then apply a digital low-pass filter. By taking the absolute value of the signal-applying full wave rectification-the application of a low pass filter returns an 'envelope' of the original signal. One way to low-pass filter a signal is by applying a moving-average window-where a mean value is taken in a window which "slides" along the temporal dimension. Another way is to use a discrete version of a traditional low pass filter, like Butterworth [58] or Chebyshev. [59] Another common way to capture an EMG envelope is to compute the root mean square (RMS) value of the signal within a window that slides across the signal (a moving RMS filter). Mathematically, this approach is only slightly different from the rectify and low-pass approach, as demonstrated by De Luca. [60]

Near-Infrared Spectroscopy
NIRS utilizes the absorption contrast of NIR light among various tissues to resolve the internal structure underneath the skin, especially muscular tissues. In addition, because components of blood-like hemoglobin and myoglobin-absorb infrared light strongly, the activity of internal structures can be estimated by changes in blood flow to specific areas. [61,62] Because NIRS relies upon measuring the TX or reflectance of light propagated through certain depths of tissue, it produces a 2D compression of 3D features, thus producing an image that is effectively a summation of the depth plane. As a result, NIRS can produce decent spatial and temporal resolution signals of absorption across a sample. From our review, it is difficult to establish a consistent spatial resolution of wearable NIR spectrometers for skeletal muscle imaging, though Hamaoka et al. [63] have achieved millimeter-level resolution using a 200-channel NIR spectrometer for functional imaging of the quadricep. The temporal resolution of wearable NIR spectrometers can range from 5 to 100 Hz, depending on the type of spectrometer. [64] The penetration depth of NIR light is dependent in part on the intensity of light, as attenuation is percentage-based. However, generally wearable devices have a penetration depth of approximately 2 cm. [65]

Optical Physics
NIRS falls under the umbrella of vibrational spectroscopic techniques, as it is based upon the molecular absorption of NIR radiation. [66] More specifically, it is based upon how biologically important bonds-like O─H, C─H, N─H, and S─H bondsvibrate in response to the absorption of light. When bonds absorb sufficient energy from photons, they can transition to different vibrational levels. Generally, bonds transition vibrational levels one at a time but with a sufficiently large input that can transition more than one level at a time, resulting in the creation of overtones. The mathematical theory behind vibrational absorption can be explained by anharmonic oscillation, a model describing the vibrational energy of two vibrating masses connected by a spring-or two atoms connected by a bond. For the purposes of this review, the mathematical theory behind this model isn't relevant. However, it is important to note that this model correctly predicts that vibrational levels can transition at more than one level at a time, and there is a critical vibrational energy threshold beyond which chemical bonds can dissociate. For more information on the anharmonic oscillator model, see the Supporting Information section.
The transfer of energy from photons to molecules can occur through molecular vibrations, determined in part by molecular polarity. More specifically, the degree of dipole moment change in a molecular bond within a vibrational transition determines the intensity of light absorption. Generally, heteronuclear, diatomic molecules undergo a greater change in the degree of the dipole moment, thus allowing them to absorb more energy. It is for this reason O─H, C─H, and N─H bonds are especially likely to vibrate in response to light absorption. Figure 5a depicts the absorbance spectra of C─H, O─H, and N─H bonds across the NIR spectrum. It should be noted that such bonds can absorb light in multiple overlapping locations along the NIR spectrum.
In addition, because a single molecule may have many distinct bonds, multiple vibrational patterns can be achieved by each separate bond, resulting in combinations of vibrational levels or combination vibrations. [67] The reason hydrogen bonds are of special importance for our purposes is due to the composition of amino acids; All amino acids are composed of a carboxyl group-COOH-an amine group-NH 2 -an atom of hydrogen-H-and a variable functional group-R. The feature distinguishing amino acids from one another is this functional group, R. [68] However, even disregarding the R group, the fundamental structural units of amino acids contain a number of heteronuclear hydrogen bonds, making them especially capable of converting light into vibrational energy. Most structural units of the body-be they tendon, muscle, or ligament-are composed of amino acids. Myosin, for example, is largely composed of glutamic acid, leucine, arginine, and lysine, as shown in Figure 6. Since every structural unit of the body is composed of a unique combination of amino acids, they each uniquely absorb NIR light, thus making them distinguishable in NIRS.
Of course, biologically important bonds exist throughout the body, including in blood: more specifically, hemoglobin. Figure 5b depicts the absorbance spectrum of www.advancedsciencenews.com www.advintellsyst.com Figure 5. a) Depiction of the absorbance spectra within the NIR range of C─H, O─H, and N─H bonds. Adapted with permission. [186] Copyright 2018, Abvista. b) Depiction of the absorptivity, measured by the molar extinction coefficient, of hemoglobin bound (HbO 2 ) and unbound (Hb) to oxygen. Adapted with permission. [187] Copyright 1999, OMLC. Figure 6. The amino acid composition of myosin, one of the two primary myofilaments making up skeletal muscle. Reproduced with permission. [188] 2009, BJSM.
www.advancedsciencenews.com www.advintellsyst.com hemoglobin-when bound (HbO 2 ) and unbound (Hb) to oxygen -across the NIR range. Functional NIRS leverages the absorption capabilities of hemoglobin to estimate muscle activity due to the strong dependence of skeletal muscle on oxidative metabolism. During activity, skeletal muscle oxygen consumption (VO 2 ) can rise 50 folds, while oxygen delivery can increase up to 10 folds. The net oxidative energy pathway in muscles can be described by the following equation [63] 6ADP where ADP is adenosine diphosphate, P i is the inorganic phosphate, NADH is the reduced nicotinamide adenine dinucleotide, ATP is adenosine triphosphate, and NAD þ is nicotinamide adenine dinucleotide. Because myofibrils require ATP to reset the myosin filament to its starting position, and muscle contractions cause ATP to dissociate into ADP and P i , it is apparent that each muscle contraction results in the need for oxygen to produce enough ATP to reset the muscle fiber.
Since oxygen is transported to skeletal muscle via hemoglobin in blood, when skeletal muscle needs more oxygen to sustain activity, the vasculature nearby dilates to allow more blood, and in turn oxygen, into the area. As such, many NIR devices measure muscle oxygenation (NIRS oximeters) as an indirect metric of muscle activity. [69]

Instrumentation
Because NIRS can be used for both structural and functional imaging of skeletal muscle, the latter of which can rely upon repeated sampling of spatial absorbance or on estimating blood oxygen dynamics within tissue, there is no standardized hardware setup for NIRS. However, in general, NIRS is accomplished through the use of three components: a light source, a photodetector (PD), and an analog filter. A depiction of the instrumentation is shown in Figure 7.
The light source-generally a light-emitting diode (LED) in the case of wearable, functional NIR machines-is selected on the basis of what the desired wavelength is, as well as its brightness. In many cases, an array of LEDs may be used to ensure consistent lighting of a given area of tissue. Generally, infrared LEDs are made out of gallium arsenide or aluminum gallium arsenide due in part to their efficiency in producing light as opposed to heat. In addition, the maximal optical power for LEDs in wearable applications is typically below 10 mW for the safety of the user, [70] while the wavelength selection exhibits significant range depending on the target output data-structural or functional data-though often 760 and 850 nm light will be used for functional imaging.
The PD is used to convert the intensity of light absorbance into an electrical current output. Generally, PDs have a p-n junction that converts photons into current, making the key characteristic of PDs their ability to sensitively respond to light absorption. In addition, many photodiodes are built to restrict the acceptance of light outside a certain angle range, thus mitigating the impact of external light on the signal. Conversely, collecting filters can be placed on top of the photodiode to collect more light. Often, infrared PDs are made of silicon, germanium, or indium gallium arsenide, though silicon-based PDs tend to have less noise than germanium-based PDs due to their greater bandgap-the energy range in which no electronic states can exist. [71] Analog filters used in the case of NIRS don't necessarily possess the frequency domain (FD) of the output signal of photodiodes. Instead, NIR bandpass filters [72] can be used to filter wavelengths of light prior to the light source reaching the photodiode itself. This can be used to actively reflect nonessential wavelengths of light, allowing only the target wavelength to be transmitted through. It should be noted that standard bandpass filters and amplifiers can still be used after the collection of light intensity. Figure 7 depicts the general pathway for the digital preprocessing of ultrasound data. It should be noted, many of these steps can be altered, accomplished in different orders, or ignored depending on the quality of acquired signal and the desired output. However, most preprocessing algorithms can be divided into one of four categories: smoothing, scatter correction, spectral derivatives, and outlier detection.

Digital Preprocessing
Smoothing: Because NIR spectral data often contains environmental noise, spectral data need to be smoothed prior to analysis. Figure 7. Depiction of the signal acquisition and preprocessing pipeline for NIR spectrometers. Steps 1-5 depict the instrumentation of NIR spectrometers, while steps 6-9 depict the order of steps taken for the digital pre-processing of acquired signals.
www.advancedsciencenews.com www.advintellsyst.com This is often accomplished with the Savitzky-Golay filter. [73] Here, the NIR spectral data are iterated step by step with a moving window, and the spectral data within the window are fitted and replaced with a low-degree polynomial. Scatter Correction: Because biological tissue contains variablesized structures, NIR light is often scattered differentially. To correct this, one can use baseline removal (remove the minimum absorbance value of the spectral data). Standard normal variate [74] and multiplicative scatter correction [75] have also been used to accomplish this. The former essentially standardizes the input data, while the latter fits each input spectrum to a template with least-squares regression, where the template is generally the mean of the dataset.
Spectral Derivative: First and second derivatives of spectral data can suppress absorption from broader peaks of the spectrum, thus highlighting subtle spectral features where necessary. Outlier detection can also be performed at this stage.
Outlier Detection: Generally, Z-scores, Tukey's method, and median absolute deviation can serve as initial methods to remove outliers. In addition, when multiple infrared spectra are in use, principal component analysis (PCA) can be used to identify additional outliers.

Ultrasound
An ultrasound machine is one that generates an acoustic wave that can be transmitted through various materials. The frequency of an ultrasound wave is higher than the human upper auditory limit (%20 kHz), [76] being generally between 2 and 15 MHz. [77] Generally, 7.5 MHz is used to image skeletal muscle and adjusted from there. Ultrasound machines generate ultrasound waves and receive the reflected echoes through the use of piezoelectric crystals, using the intensity and speed of reflected waves to indicate the type of material encountered. More specifically, because different tissues have differing stiffness, their different reactions to ultrasound waves can be used to characterize internal tissues. Though there are many types of ultrasound devices and modes, in general, ultrasounds provide good spatial and temporal resolution data. Generally, ultrasounds provide a spatial resolution between 0.5 and 5 mm and a temporal resolution between 25 and 204 Hz. [78] In addition, depending on the type of transducer probe and frequency, ultrasounds can provide information up to 17 cm deep. [79] However, assessment of muscle activity is limited to perceptible changes in muscle thickness, making isometric contractions difficult to measure.

Acoustic Physics
Tissue Interaction: As an ultrasound wave travels through tissue, there are generally four interactions that occur: refraction, reflection, scattering, and absorption. When ultrasound encounters boundaries between different media, such as fat and muscle, part of the ultrasound can be reflected and the other transmitted, as shown in Figure 8. The reflected and transmitted directions are given by the reflection angle θ r and the TX angle θ t . [80] The strength of reflection from an interface is dependent upon the difference in impedance between the two media and the incident angle at the boundary. [81] If the media impedances are equivalent, then there is no reflection and so no echo to record. However, if the media impedances are large enough, then there can be a nearly complete reflection allowing for the detection of a tissue boundary by the ultrasound. For example, the interface between soft tissue and bone is such a stark difference in acoustic impedance that a strong echo is created and easily measured by ultrasound. The reflection intensity is also angle-dependent, meaning in practice, the ultrasound transducer must be placed perpendicular to the target tissue to visualize it adequately. In addition, because sound will propagate at different speeds in different media, there will be a change in sound direction-or refraction-at the boundary. This change in direction can be explained by Snell's Law. [80] During ultrasound scanning, a coupling medium must be used between the transducer and the skin to displace air from the transducer-skin interface. Generally, ultrasound gel is used, as it not only displaces air, it also can conform to the surface of the transducer to provide a flatter contact point.
Scattering is the redirection of sound in any direction, often by rough surfaces or heterogeneous media. Normally, the scattering intensity is much less than reflection intensity, especially if the transducer is perpendicular to the target tissue, and so scattering tends to introduce a global low-level noise to the signal.
Absorption is the direct conversion of sound energy into heat. Higher frequency sounds tend to be absorbed at greater rates than lower frequency, thus making it generate more heat in the body. The tradeoff to frequency selection is that though higher frequency sounds have lower penetration depths and greater absorption, they also have a better axial resolution. Generally, the frequencies most employed for imaging skeletal muscle are between 3.5 and 10 MHz. [82]  www.advancedsciencenews.com www.advintellsyst.com

Aspects of Spatial Resolution
In ultrasound imaging, there are two aspects of spatial resolution: axial and lateral. Axial resolution is the ability to discern between two points parallel to the sound beam's path. Lateral resolution is the ability to discern between two points perpendicular to the beam's path. The selection of frequency affects both aspects. At higher frequencies, the penetration depth of the ultrasound wave is lower but is also higher in axial resolution. Lateral resolution is a function of line density-the narrower the beam of ultrasound, the more densely packed each acoustic wave is making the resolution better. At higher frequencies, the ultrasound beam is narrower, thus improving lateral resolution.

Ultrasound Modes
Though there are many modes of ultrasound imaging, only the three depicted in Figure 9 are relevant for the purposes of this paper. A-mode ultrasounds are the simplest of the three modes we will discuss, where a 1D acoustic pulse is sent into the target tissue, and the resulting echo at multiple depths-estimated by the time elapsed between emission and reception of the acoustic wave-is measured as amplitude. [83] Because tissue boundaries-the point at which there is a marked change in tissue type, like fat to muscle-reflects a significant portion of incoming acoustic waves, changes in A-mode ultrasound overtime can be reflective of changes in the depth of a tissue boundary. For example, the A-mode ultrasound echo signal on the human forearm muscle can reflect the interface of the muscle-bone and muscle deformation. While this mode doesn't yield significant structural information, it can be used for exceptionally efficient generation of functional imaging. B Mode ultrasounds produce a 2D image of the target tissue within a particular cross-section by simultaneously scanning the area with 100-300 distinct piezoelectric elements. Therefore, B-mode ultrasounds are effectively a collection of A-mode ultrasounds obtained in rapid succession across a broader area. The amplitude of the echo is converted into brightness, while the horizontal (width) and vertical (depth) directions represent real distances within the tissue. M-mode ultrasounds do the same thing as B-mode ultrasounds but do so repeatedly over the course of time to provide a video of a 2D cross-section of tissue. [83] Generally, M-mode ultrasounds are used to track muscle activity in vivo. [84]

Instrumentation
Ultrasound devices are essentially composed of a transducer probe, a transducer pulse controller, and a processing unit. [83,85] The transducer probe produces ultrasound waves and receives the reflected echoes. It accomplishes this through the use of piezoelectric crystals, which have the unique property of undergoing mechanical deformation in response to electrical stimulus. When done correctly, this mechanical deformation can occur quickly enough to become a vibration, which can produce sound www.advancedsciencenews.com www.advintellsyst.com waves that travel outwards. A second important aspect of piezoelectric crystals is their ability to emit electric currents in response to mechanical deformation. This principle can therefore be leveraged to use the same piezoelectric crystal to both emit sound waves and receive them. To prevent measurement of reflections from the probe itself, the probe has a soundabsorbing layer behind it, as well as an acoustic lens in front to focus emitted sound waves. It should be noted transducer probes may contain one or more piezoelectric crystal elements, each of which has its own circuit. The advantage of a multicrystal setup is that the ultrasonic beam can be "steered" by changing the timing in which each element gets pulsed; steering the beam can be especially useful for cardiac ultrasounds. The shape of the transducer probe itself also can determine the field of view of the ultrasound image, just as the selection of frequency can determine the penetration depth of the ultrasound wave. The transducer pulse controller allows the operator to set and change the frequency and duration of ultrasound pulses, as well as the scan mode of the machine. As mentioned earlier, the choice of frequency affects the penetration depth as well as the axial and lateral resolution. The duration of the ultrasound pulses, however, also inadvertently determines the temporal resolution of the ultrasound; when a piezoelectric crystal is emitting sound, it cannot receive echoes. As such, the pulse duration also determines how much time elapses between each recording instance. Longer pulse durations will therefore result in lower temporal resolution, just as shorter pulse durations yield a higher temporal resolution. Figure 10 depicts the instrumentation pipeline for ultrasounds. The transducer pulse controller sends a signal to produce a particular type of wave, typically in the form of pulse trains through the TX. Prior to the transducer itself receiving the signal, a highvoltage amplifier is used to increase the voltage gain. This signal is sent through a transmission/receiver (TX/RX) multiplexer, thus allowing the same transducer to be both a transmitter and a receiver, reducing the number of connections needed. When the transducer sends an acoustic wave through the body, the body tissue will greatly attenuate the signal (due in part to the absorption of energy). As such, when the transducer receives the signal, it sends the input signal through the receiver channel (RX) to an instrumentation amplifier to improve signal quality and mitigate common-mode noise. Then, the signal is sent through a variable gain amplifier, as since ultrasound waves experience exponential attenuation within tissue, the application of exponential gain can reduce the effect of this attenuation. [86] 4.3.6. Digital Preprocessing

Analog Signal Processing
The preprocessing of ultrasound images is done to remove speckle noise resulted from wave interferences. In B-mode ultrasound-and by extension, M-mode ultrasound-the input signal after analog preprocessing is sent through three stages: beamforming process, 2D image formation, and image enhancement. Figure 10 depicts the digital preprocessing pipeline following analog preprocessing.
Beamforming Process: First, a digital signal filter must be applied to remove noise prior to envelope detection. Though bandpass filters-through the use of Butterworth and lowand high-pass filters-are decently effective, [87] Wavelet-based noise filtering appears more effective in preserving high SNR [88] while still removing speckle noise. [89] Then, a Hilbert transform is used to extract the envelope of the signal [90] due in part to its property of having no spectrum components at www.advancedsciencenews.com www.advintellsyst.com negative frequencies [91] while simultaneously doubling the spectrum components at positive frequencies. [92] 2D Image Formation: In this portion, the input signal is compressed via a log function to enhance details in high-value regions of the image and decrease the output signal range in low-value regions. [93] Then the amplitudes of the signal across spatial dimensions are converted to brightness.
Image Enhancement: Generally, image enhancement can be in the form of histogram processing or spatial filtering. For the purposes of this paper, it isn't especially important to understand the specific algorithms used for each. However, in general, histogram processing techniques-like histogram equalization [94] are used to manipulate the brightness and contrast of an image by affecting the intensity distribution of the image or the probability density function itself. Spatial filtering, in contrast, encompasses operations performed directly on the pixels of an image, often to smoothen, sharpen, or filter the image. Good examples of spatial filters include Gaussian filters, median and mean filters, [95] and the Wiener filter. [96]

Signal Analysis
Once a raw sensing signal has been preprocessed-be it through analog or digital means-naturally, the next step is to extract meaningful information from the collected data. Muscle tracking technologies have been relying on a broad spectrum of algorithms, most of which can be classified into three primary categories: time-frequency domain, machine learning, and deep learning.

Time-Frequency Domain
Time-frequency domain algorithms refer to a class of algorithms designed to extract frequency characteristics from time-series signals and vice versa. Though this is often used in signal preprocessing for noise reduction, it also has valuable applications for understanding the composition of an input signal.

Fourier Transform
Intuition: In many cases, time-series data can be broken down into constituent components of sinusoidal waveforms at different frequencies, intensities, and phases. In the case of Figure 11a, the time-series signal can be broken down into its three constituent sinusoids at any given point in time. Extracting spectral information over a given period of time allows for the decomposition of input data into frequency information. The Fourier transform (FT) is used to decompose input functions depending on space or time into functions depending on spatial or temporal frequency, though often it is used to decompose time-domain functions into frequency-domain functions. [97] To understand the mathematical theory behind FTs, see the Supporting Information section.
The ability to deconstruct an input signal into an amalgamation of various sinusoids can be useful both in noise detection-identifying certain frequencies that cannot be indicative of signals, such as 60 Hz electrical noise-and signal analysis-identifying patterns of activity. Coorevits et al., [98] for example, made use of a variation of FT to characterize isometric contractions of the hip and back muscles.

Wavelet Transform
Both wavelet and FTs are used to extract frequency information from time-series signals. However, the necessity of the wavelet transform (WT) arises from a weakness of FTs: they capture global frequency, not local frequency. In the previous section, we discussed how the FT is calculated via an integral over the entire dataset (0 to infinity). As such, the frequency information captured is of those that persist over the entire input signal, not a specific point. Of course, attempts have been made to capture Figure 11. a) Depiction of the intuition behind Fourier transforms, where an input time-series signal (red) is composed of three sinusoids, thus populating the frequency domain (blue). b) Shows the difference in the sinusoidal waveforms used to extract frequency information from time-series signals between Fourier and wavelet transforms.
www.advancedsciencenews.com www.advintellsyst.com more local frequency information with FTs, like Short-time FTs, [99] but in general, these approaches only give an approximation of local frequency information. WTs are used to compensate for this shortcoming of FTs. Intuition: One key difference between Fourier and WTs is the sinusoidal waveforms used to approximate frequency information from time-series signals, as depicted in Figure 11b. Where FT uses sinusoids of varying frequencies, WT uses wavelets. A wavelet is a wave-like oscillation that is localized in time: essentially an oscillation that decays quickly with respect to time. There are a number of wavelets, each with unique properties, but in general, there are two parameters of importance for wavelets: scale and translation. Scale refers to the degree of compression of the wavelet, while translation refers to the time location of the wavelet. The goal of WTs is to compute how much of a given wavelet-of a specific scale-is in a signal at a particular time. As such, a WT can be accomplished by convolving a signal with a set of wavelets at a variety of scales to obtain local frequency information. For a better understanding of the mathematical theory behind WT, reference the Supporting Information section.
In certain cases, the advantage of WT over FT may not be as impactful. Coorevits et al. found that in the case of sEMG recordings of isometric contractions, short-time FT provided equally accurate characterization of hip and back muscle activity in comparison to WT (Coorevits). However, the advantage of WT lies especially in nonisometric movements. Nardo et al. [100] made use of WTs to assess the frequency range of every nonisometric muscle activation detected by sEMG. In addition, WTs have been found to reflect signal components relating to the activities of variable-speed muscle fibers as well as the muscle timing characteristics, making them ideal for optimizing training protocols for rehabilitation. [101]

Machine Learning
Machine learning algorithms refer to a set of algorithms that have the ability to learn without being explicitly programmed to do so. From this perspective, even linear regression can be considered a machine learning algorithm, as the optimization of the feature parameters (slope and intercept) is often accomplished using gradient descent, a machine learning algorithm designed to converge onto a solution based upon the rate of convergence. Machine learning as a class of algorithms is far too broad a field to cover in this review, so we will focus on three core algorithms for the field of muscle control: support vector machine, PCA, and nonnegative matrix factorization.

Support Vector Machine/Regression (SVM/SVR)
Both SVM and SVR derive from the same principle algorithm, though each has different applications. The former attempts to binarily classify data, such as the patterns of muscle activation, [102] while the latter can be used to construct a relationship between EMG signals and force generation. [103] Figure 12a depicts the mechanism behind SVM.
Intuition: SVM was originally intended to be a binary classifier of objects on the basis of training data. [104,105] In a given feature www.advancedsciencenews.com www.advintellsyst.com space-an n-dimensional space containing all instances in a dataset-SVM aims to construct a hyperplane-an n-dimensional plane-to best separate training data with different class labels. This hyperplane is derived on the basis of the limited scope of training instances-in this case, called support vectors-with the goal of maximizing the margin on each side of the plane. The unique feature of SVM is the ability to linearly separate nonlinear data. When training data is not linearly separable, SVM can make use of kernels to project nonlinear data into feature spaces of higher dimensionality where linear separation of classes is possible. [106] For a given feature space, a successfully derived hyperplane represents a classification model that can be used to predict the class label of test instances in this space. Depending on which side of the hyperplane a given test instance falls, the model can classify it into one of two classes. SVR is an extension of the SVM algorithm, with a goal of predicting numerical values instead of class for test instances. [107] To do so, it utilizes the same hyperplane used in SVM but also makes use of decision boundaries, which can be understood as a margin of error around the hyperplane itself. [108] Instances that fall within the decision boundaries-or margin of error from the hyperplane-are "accounted for" by the regression model, while those that fall outside are not. As such, the goal of SVR is to maximize how many training instances are accounted for by the model and then use the resulting model to predict numerical output values for test instances. Just as with SVM, when the training data is nonlinear, a kernel can be used to project the dataset to a feature space in which the data can be linearly separable. For more information on the mechanisms behind SVM and SVR, see the Supporting Information section.

Principal Component Analysis
Intuition: PCA is a machine learning algorithm designed to reduce the dimensionality of large datasets by transforming a large set of variables into a smaller one that still retains much of the information in the large set. If we consider the case of predicting muscle contraction force, we can see there are a number of variables that play a role: motor neuron firing rate, number of motor neurons innervating muscle, size of each MU, etc. In addition, there are many variables that might play a role, but have variable impact: MU size, myelination of motor neurons, hydration etc. PCA attempts to extract whatever combinations of variables explain the most variance in the dataset.
Looking at Figure 12b, it is apparent "combinations of variables" refers to redefining the axes however possible to maximize the variance accounted for by a given axis or principal component. While Figure 12b depicts this in two dimensions, this principle can apply to any n dimensions. By representing input data as principal components defined by combinations of features, PCA can drastically reduce the number of features needed to accurately represent the same data. For the mathematical theory behind PCA, refer to the Supporting Information section.

Nonnegative Matrix Factorization
Intuition: Nonnegative matrix factorization (NMF) is a machine learning algorithm with a goal of dimensionality reduction-reducing the number of features needed to accurately represent a dataset. In this, an input matrix X-as in Figure 12c, a video of four circles turning on and off repeatedly-is factored into, usually, two matrices W and H, with the unique property that all three matrices have no negative elements. In the case of muscular activity, nonnegativity is inherent to the data itself and so poses no problem.
Let X be an m Â n matrix. When X is factored in W and H, they have dimensions m Â r and r Â n, respectively. Often, the rank r of the resulting matrices is significantly lower than both m and n. In a case where X represents a video of four lamps arranged in a grid turning on at various times, m is the number of pixels in the image, and n is the number of images, or frames, in the movie. When X is factored into W and H, W represents r unique spatial components in the video while H represents r unique temporal components. If, in the video, there were four lamps, when r is set to four, we would expect the columns in W to reflect the spatial locations of the four lamps and the rows of H to reflect the temporal activity of each lamp. For a better understanding of the mathematical theory, see the Supporting Information section.

Deep Learning
Deep learning is a subset of machine learning relying upon the use of neural networks-artificial approximations of neurons and their connections-to process and analyze data. Artificial neurons are simply a placeholder for a mathematical function that attempts to transform a set of inputs into a single output using an activation function. Generally, artificial neurons accept a vector of inputs multiplied by their relative weightagescorresponding to the strength of impact of a given input on the neuron-sum the vector and pass it through a function that transforms an input x into an output y. While there are a number of activation functions in use today, some common ones include ReLU, sigmoid, and tanh.
Artificial neurons can be organized into networks, where their collective function is determined in part by how they are organized: the layers they are arranged into, the number of neurons per layer, how each layer connects to one another, etc. These networks are called neural networks, with a more fundamental example being multilayer perceptrons. To better understand the fundamentals of neural networks, see the Supporting Information section.

Convolutional Neural Network
Convolutional neural networks (CNNs), depicted in Figure 13a, are a type of deep learning model designed to automatically and adaptively learn spatial hierarchies of features from low to high-level patterns. Generally, there are three types of layers in a CNN: convolution, pooling, and fully connected layers. The first two perform feature extraction, while the third maps the extracted features into a final output-such as classification. [109] Convolution is a specialized type of linear operation used for feature extraction, where a small array of numbers-called a kernel-is applied across the input-an array of numbers or a tensor. An element-wise product between each element of the kernel and the input tensor is calculated at each location of the tensor and summed to obtain the output value in the corresponding position of the output tensor, called a feature map. As such, each kernel is convolved across the entire input tensor, with the resulting output being captured in a feature map of a smaller size than the original tensor. This same process is repeated with different kernels, which, in this case, correspond to different feature extractors. There are three key hyperparameters that define the convolution layer: kernel size, kernel stride, and kernel number. The first refers to how large the kernel convolving over the input tensor is: larger kernels create a smaller feature map as kernels must be at least partially within an input tensor, thus bounding the kernel within the confines of the input, but can also encode more complex features. Kernel stride refers to the distance between two successive kernel positions; larger kernel strides result in smaller feature maps and downsampling of the input but can also reduce computational time. Kernel number is simply the number of distinct kernels passed over an input: the more there are, the more unique features extracted. [110] Pooling layers provide downsampling operations, which reduce the size of feature maps to introduce a translation invariance to small shifts and distortions and to decrease the number of subsequent learnable parameters. One example of a pooling layer is max pooling. In this, patches from the input feature map (the output of the convolution layer) are extracted, and the maximum value from each patch is passed onto the pooling layer output.
The ability of CNNs to extract features from inputs is what makes them invaluable for extraction spatial features from input images.

Recurrent Neural Networks
Recurrent neural networks (RNNs), depicted in Figure 13b, process sequential data by feeding the output from the previous step as an input to the current step. [111] While they learn from training inputs, just as multilayer perceptrons (MLPs) and CNNs, their distinguishing feature is memory-their ability to allow information from previous inputs to impact current input and output. [110] The basic mechanism allowing this feature is the addition of gates-features of a single node, or neuron, that regulate the 'state' of a neuron by selectively letting information through. Two common types of RNNs are long short-term memory (LSTM) and gated recurrent unit (GRU) RNNs, with the difference between the two arising from the architecture of their respective neurons. [112] Figure 13b details the internal operations of an LSTM-RNN. The key to an individual LSTM node, or neuron, is the cell staterepresented as the horizontal line running along the top of the node. Though it is possible for the cell state to remain unchanged from entry to exit, LSTMs have the ability to modify the cell state through the use of gates. Gates are composed out of a sigmoid activation function and a pointwise multiplication operation. The sigmoid layer outputs numbers between zero and one, effectively determining "how much" a given component should be passed through by multiplying this output to the cell state. www.advancedsciencenews.com www.advintellsyst.com The first step in an LSTM is the forget gate, which determines how much information from the cell state will be disregarded. To do so, it looks at h tÀ1 and X t , and outputs a number between 0 and 1 for each number in the cell state C tÀ1 . In this case, a "1" indicates the entirety of the cell state is preserved, while a "0" indicates none will be preserved. The next step is the input gate, which determines what information should enter the cell state. First, a sigmoid layer decides what values to update, and then a tanh layer (activation function) creates a vector of new candidate values for the cell state that can be added to the existing cell state. The results of these two layers are multiplied together and added to the cell state. The final step is the output gate, which controls the output of the neuron. First, a sigmoid layer decides what part of the cell state will be passed, while we pass the cell state through a tanh layer. The results of these two layers are multiplied together to output only the desired portions of the cell state. This result becomes the new output, h t .
There are many other types and modifications of RNNs, like GRU RNNs, which use a similar gating method to control the passage of information but have fewer parameters to modify since it only has two gates-a reset and update gate but no output gate. [113] In general, because there are fewer parameters to tune, GRUs are able to offer comparable performance to LSTMs while being significantly faster to compute, [114] though both LSTMs and GRUs have proven effective for a wide variety of applications, including EMG analysis; because they can act on "memory", they have been used to analyze and classify time-series signals, including EMG signals. [115]

Applications
Various sensing modalities and algorithms have been utilized in the past to image and track the activity of muscles. For the purpose of simplicity, we will be separating the following section into two parts: Historical Overview and Key Advances. The former section will provide a broader understanding of the technologies used for muscle tracking, while the latter will highlight key advances in the field. It should be noted that the purpose of this paper is to identify technologies that produce data that can be used to detect muscle weakness and tremors in real time while the patient is mobile, thus necessitating a focus on wearable devices with high spatial and temporal resolutions, low processing latency, and if applicable high classification accuracy.

Historical Overview
Generally, these applications can be separated into one of three categories: structural, functional, or structural-functional imaging. The first refers to applications that produce effectively static information about a muscle. The second refers to applications that produce poor spatial understanding of muscles but good temporal resolution of muscle activity. The third refers to applications that perform well in both departments, producing good spatial and temporal resolution videos of muscle activity. Since many applications cannot be strictly defined into one category or another, it is better to view these classes as a spectrum rather than distinct categories. In addition, a given sensing modality may produce different outputs, so the following section will be organized by sensing modality, with each further subdivided by the general category of imaging. Figure 14 provides a visual summary of key innovations that provide varying levels of spatial and temporal resolution data for muscle imaging.

Electromyography
Since our review focuses on noninvasive, wearable technologies, imEMG is not a relevant approach for our purposes, making sEMG the primary focus of this section. By its nature, sEMGs are not easily capable of obtaining high-resolution structural information, making the primary output of sEMGs functional imaging. In addition, because sEMG requires conductive electrodes to be placed on the skin, factors that affect skin conductance -like sweat production-can skew results over time. It should be noted that in many cases, sEMG does not generate exceptionally large quantities of data, making the bottleneck in sEMG systems data processing-including noise reduction-not data volume.
Functional Imaging: A large amount of research has been dedicated toward the characterization and diagnosis of neuromuscular disorders. Many signal analysis algorithms-time-frequency domain, machine learning, and deep learning-have been used in conjunction with EMGs for this exact purpose. Variations of SVM, for example, have been used alone and in conjunction with other algorithms-like PCA and fast Fourier transforms (FFT) [116] and variations of WTs [117][118][119] -to confirm the diagnosis of neuromuscular disorders, making automated detection of neuromuscular disorders through EMG analysis a very real possibility. [120][121][122] Similarly, many clever implementations of deep learning have been employed for the diagnosis of patients with neuromuscular disorders, including wavelet neural networks [123] -a modification of a standard neural network where the activation function is itself a wavelet [124] -PCA and FFT in conjunction with standard MLPs, [116] and modified feed-forward neural networks (FNN). [123] However, for the purposes of this review, our focus is on applications that allow for the identification of specific muscle activity or their intended movement.
Because EMGs, especially sEMGs, have relatively poor spatial resolution, it is often difficult to estimate the activity of smaller muscle units, like MUs. However, efforts have been made to extract the activity of smaller structural muscle units. Gazzoni et al. [125] made use of a segmentation-classification pipeline, where they utilized complex WT (CWT) to segment the sEMG data and a multichannel neural network (the mechanism being beyond the scope of this review) to classify the data to identify and extract data of individual MU action potentials from simulated sEMG signals. Wimalasena et al. [126] made use of AutoLFADS-a deep learning approach relying on RNNs to model spatial and temporal regularities that underlie muscle activation-with imEMGs to provide estimates of muscle activation for multiple muscles that proved to improve the prediction of joint kinematics when compared to low-pass or Bayesian filtering.
Many attempts have also been made in predicting kinematic information from EMG signals. Lenin et al. [127] made use of PCA and a modified WT algorithm-dual-tree CWT-to extract relevant information from sEMG signals, and used a subtype of FNN, called cascaded feed-forward neural networks, to recognize www.advancedsciencenews.com www.advintellsyst.com hand grasps using EMG signals. Jaramillo-Yanez et al. [128] proposed a short-term hand gesture recognition software with sEMG using discrete wavelet transform for feature extraction, and SVM as a classifier, to distinguish 5D gestures. Wei et al. [129] proposed a multiview CNN-a deep learning architecture that aggregates information from multiple 2D views of the same target into a single representation [130] -for sEMG-based gesture recognition. In addition, Chen et al. [131] proposed a CNN architecture that relied on transfer learning-where the optimizations derived from previous training rounds are used to speed up the training of the new model-to classify 30 gestures from sEMG data with over 90% accuracy and only two repetitions of gestures for training. While there have certainly been many other models proposed for the purpose of kinematic predictions-limb movement, gestures, etc.-the earlier should be sufficient to show how various classes of algorithms have Figure 14. A selection of three key applications, one for each modality. a) A flexible, wearable sEMG with inertial measurement units for the online monitoring of muscle activity and joint rotation. Adapted with permission. [183] Copyright 2019, ICORR. b) A wearable bracelet using NIR spectroscopy to classify gestures with multiple degrees of freedom and joint activity. Adapted with permission. [158] Copyright 2017, ACM. c) A wearable multichannel A-mode ultrasound for online classification of high degree-of-freedom movements. Adapted with permission. [181] Copyright 2021, IEEE.
www.advancedsciencenews.com www.advintellsyst.com each been leveraged uniquely to extract information from EMG signals.

NIR Spectroscopy
A large portion of research in applications of NIRS focuses on functional imaging due to how hemoglobin and myoglobintwo components that contribute significantly to absorption of NIR light-are present in high levels within muscles and change depending on the oxygenation status of the muscle. [132] This property allows for the changes in absorption of NIR light over time to be correlated to the oxygenation status of the target tissue, often skeletal muscle. [133] As such, while some research has been done on structural imaging of skeletal muscle using NIRS, more has been done for functional/structural-functional imaging.
One advantage of NIRS in wearable systems is that it doesn't rely on conductance-be it electrical or acoustic-to obtain information. As such, factors that affect the conductance of the skinsweat production, gel hydration, etc.-have little effect on the output. However, NIRS has the potential to produce large amounts of data-especially in camera-based systems with higher spatial and temporal resolution. As such, many wearable applications must purposefully be designed to capture lower spatial or temporal resolution data to avoid causing a bottleneck in data processing.
Structural Imaging: Within the context of structural imaging, contrast agents are often used to improve spatial resolution of NIR imaging. Chrzanowski et al. [134] made use of the fluorescent dye, indocyanine green (ICG), to image skeletal muscle using NIRS, thus imposing less reliance on algorithms for signal analysis. Historically, NIR fluorescence imaging using ICG suffers from shallow imaging depth, low contrast, and poor clarity caused by light scattering and autofluorescence. [135] To obtain higher clarity images, often other fluorophores can be used in conjunction with NIR imaging in the 1500-1700 nm range, but these fluorophores often contain toxic elements like lead, cadmium, and arsenic, making them unsuitable for clinical use. [136] To obtain higher clarity images with safe fluorophores, Ma et al. [137] made use of generative adversarial networks-a class of deep learning algorithms designed to generate new data with the same distribution as the training data [138] -to transform input images into higher clarity images.
NIRS has also been used to assess the quality of skeletal muscle tissue. Currà et al. [139] made use of PCA and partial leastsquares discriminant analysis to discriminate muscle into bicep or tricep, their condition into normal or poststroke, and the effect of botulinum toxin treatment at various time points. It should be noted that the use of contrast agents for spectroscopy is limited by the kidney health of the individual as well as the healthcare resources required to administer such agents, making it impractical for wearable, at-home structural imaging of muscle. However, the use of the approaches Ma et al. utilize may still prove useful in improving muscle fiber discrimination using NIRS without the use of contrast agents.
Functional Imaging: The reason NIRS can be so useful for functional imaging of skeletal muscle is the strong dependence of skeletal muscle on oxidative metabolism and in turn hemoglobin. Continuous wave NIRS oximeters, where the light source is of constant intensity and transmitted light intensity is measured, have been used to assess the extent of muscle activity and subsequent recovery by estimating tissue oxygenation. [140] An improvement to the original single source-detector pair is the spatially resolved spectrometer (SRS), where light intensity is measured at multiple spacings from the source to better improve precision of measurement. [141] In addition, SRS-NIR can enhance the contribution of deeper tissues and attenuate the contribution of more superficial tissues to the NIR signal. [142] Though SRS-NIR utilizes multiple source-detector pairs, they do not produce a high enough spatial resolution function as structural imaging. Though there are other ways to utilize NIR imaging, such as time-resolved spectroscopy and FD, the former relies upon lasers to produce light pulses with a duration of a few dozen picoseconds, making it impractical for wearable purposes, [143] while the latter's mechanism is beyond the scope of this review, though it has been shown to be effective. [144] From our search, a significant portion of NIRS functional imaging focuses upon the oxygenation status of muscular tissue to assess its health, not to track individual muscle fascicle activity. For example, NIRS has been used to evaluate patients with peripheral vascular disease (PVD), categorizing the severity of PVD in patients by quantifying metabolic recovery of skeletal muscle. [145] NIRS has also been used to evaluate skeletal muscle in patients with heart diseases, including congestive heart failure, finding impaired muscle metabolism as a result. [146,147] NIRS measurements have been used to study patients with neuromuscular disorders-including cytochrome c oxidase deficiency, [148] mitochondrial DNA mutations, [149] and Friedreich's ataxia [150] -finding a paradoxical increase in muscle oxygenation at the onset of treadmill exercises.
However, research has also been conducted in the use of photoplethysmography (PPG)-the continuous measurement of absorption of light over time to measure the volumetric changes in blood circulation-to detect the relative displacement of blood, as opposed to the changes in concentration of various chromophores. In cases where muscles are adjacent to larger blood vessels, muscle contractions can squeeze blood vessels, in turn causing vessel and blood displacement. Zhao et al. [151] leveraged this feature and used two PPG sensors, combined with a CNN, to distinguish nine finger gestures with an 88.32% accuracy. Similar approaches have been attempted by others, suggesting functional NIRS has the potential to extract significant activity from superficial muscles.
Structural-Functional Imaging: Recent advances in NIRS technology have made use of multiple source-detector pairs to image a skeletal muscle, taking advantage of studies showing regional differences in skeletal muscle metabolism. [152,153] Several multichannel NIRS systems have been developed to detect regional differences in muscle oxygenation to estimate muscle activity, [154] though often at low temporal resolutions. The reason for the lower temporal resolution of such devices is not solely an instrumentation constraint. Rather, because these devices rely upon oxygenation status of muscles to produce meaningful data, such devices would not benefit from higher temporal resolutions simply because oxygenation status does not change as rapidly as electrical activity would. As such, while these devices yield structural and functional muscle imaging, both are generally of relatively low resolution, making them insufficient for assessing fine www.advancedsciencenews.com www.advintellsyst.com motor activity in real time. It should be noted that there are excellent examples of high spatial and temporal resolution NIRS devices-such as Chen et al. [155] camera imaging system for NIR fluorescence image-guided surgery and Feng et al. [156] NIR optical tomography system-but to our knowledge, many are not currently wearable devices, making it difficult to implement them as at-home rehabilitative tools. There have also been notable applications of NIR imaging for hemodynamic responses in other parts of the body, including the frontal cortex. Abibullaev et al. [157] used multichannel NIRS to measure brain surface hemoglobin concentrations and then used a continuous WT for multiscale decomposition of the input signal followed by three different classifier models-an artificial neural network, linear discriminant analysis (LDA), and SVMto classify what mental task the brain activity can be ascribed to baseline, multiplication, imagery rotation, or letter padding.
While such examples aren't directly applicable to muscle tracking, they do indicate a possibility of utilizing certain signal analysis algorithms in conjunction with NIRS for efficient structural-functional imaging of skeletal muscle.
While not necessarily high in spatial resolution, a number of devices have been proposed to use an approach similar to PPG, where an array of NIR LEDs and photodiodes are used to measure blood displacement and in turn estimate muscle activity. Notable among these attempts is McIntosh et al., [158] who used a multichannel NIRS with an artificial neural network to classify 12 gestures involving multiple degrees of freedom, multiple joints, and minute gestures of the fingers, with a 93.3% accuracy. As such, it is important to note that spatial and temporal resolutions are not the only considerations for the feasibility of a muscle tracker.

Ultrasound
The three main modes of ultrasound-A, B, and M moderelevant for this paper each provide functional, structural, or structural-functional data, respectively. Similar to EMGs, ultrasounds have been used to identify neuromuscular disorders, track functional muscle units over time, and correlate ultrasound data to kinematic data, such as limb movement or gestures. One key advantage of ultrasound imaging over other modalities is the ability to selectively image specific planes of tissue based upon the frequency, thus allowing for the selective imaging of deep tissue or shallow tissue: a feature neither EMGs nor NIRS can accomplish. However, it should be noted the volume of data produced from M-mode ultrasounds tends to be much larger than that of sEMGs or NIRs, making it harder to create wearable M-mode ultrasounds. As such, wearable ultrasound devices generally utilize multichannel A-mode ultrasound to limit data volume. In addition, ultrasound imaging requires consistent acoustic conductance with the skin, often through the use of ultrasound gel. As such, long use of ultrasound imaging-the intended use of wearable systems-can be difficult as variations in gel hydration over time can skew acquired data.
Structural Imaging: Generally, B-mode ultrasounds are used to extract structural data from human tissue. Shi et al. [159] used a combination of B-mode ultrasound and EMG to find a linear relationship between the thickness of the muscle-in this case, the biceps brachii-and the torque generated at the elbow, as well as define a relationship between muscle thickness and fatigue. [160] In addition, by obtaining the deformation field between different finger gestures, Shi et al. [161] were able to obtain a recognition accuracy for five 1D gestures of 94.05%. More specifically, they utilized B-mode ultrasound in conjunction with WTs to extract features and SVM for classification. In addition to predicting kinematic information, B-mode ultrasounds have also been used to diagnose myositis-the inflammation of muscle. Burlina et al. [162] proposed a CNN model for diagnostic classification of ultrasound images, finding a higher accuracy with the proposed model over the conventional machine-learning method based on random forests-a machine-learning algorithm based on creating and optimizing decision trees. [163] Since B-mode ultrasounds rely on singular images, and MUs are often only apparent upon activity, it is difficult to extract spatial information on MUs solely through structural imaging.
Functional Imaging: Because B-mode ultrasound devices tend to be bulky, A-mode ultrasounds can be preferable for use in wearable biosensors. In addition, the computational complexity for processing A-mode ultrasound data is much lower, as it is only 1D data in contrast to B-mode's 2D input. Guo [164] has developed a wearable A-mode ultrasound sensor to deduce the relationship between muscle deformation and wrist extension angle. Hettiarachchi et al. [165] made use of eight transducers utilizing A-mode ultrasound imaging to recognize finger movement of transradial amputees. Li et al. [166] made use of Amode ultrasound in conjunction with PCA for dimensionality reduction and LDA for classification to recognize with 96% accuracy six distinct movements. Beyond this, there have been studies on force recognition, [167] wrist movement recognition, [168] and arm position [169] based on A-mode ultrasound. More recently, Lu et al. [1] created a wearable gesture recognition pipeline using PCA for dimensionality reduction and LDA and SVM for the classification of 10 gestures in real time.
Structural-Functional Imaging: To obtain both structural and functional information, at the minimum, a series of B-mode ultrasound images are required, though ideally M-mode ultrasound would be used to obtain consistent, real-time imaging. Researchers have used ultrasound to study muscle fascicle structure and function. Many algorithms have been developed in an attempt to automate the process of detecting and tracking individual muscle fascicle activity, though often non-machine learning methods are utilized. [170,171] While there is work leveraging ML strategies for the analysis of specific skeletal muscle features, [172,173] only recently have algorithms been proposed to successfully measure fascicle length changes through time. Rosa et al. [174] tested five machine-learning models to track muscle fascicle length versus time and found SVM to be the best-performing model. To our knowledge, few attempts have been made to detect and extract information about individual MUs from Mmode ultrasounds, though Ali et al. [175] proposed a deep learning model relying on using a CNN to detect MUs and an RNN to extract temporal activity of each MU. It should also be noted that beyond signal analysis, another mode of ultrasound that can be used to obtain structural and functional imaging is Doppler mode ultrasound, [176] which is beyond the scope of this review to explain in detail but is worth noting.

Electromyography
In 2021, Liu et al. [177] developed NeuroPose: a wearable system for 3D hand pose tracking using an sEMG wearable device. NeuroPose attempts to extract 3D finger motion solely from eight-channel sEMG data through the use of two proposed models: an encoder-decoder architecture with a CNN (specifically, ResNet) embedded within and an RNN with LSTM architecture. The combination of ResNets and encoder-decoder architecture proved to provide the lowest median error in joint angle estimation at only 6.24°, though it should be noted the LSTM architecture by itself achieved a median error of 10.66°. It is also important to note the authors made use of transfer learningpre-training a 'template' of the encoder-decoder model with ResNets by extensive training on one user and then fine-tuning the resulting template for each participant through less intensive training-resulting in a reduction of training overhead on a new user by an order of magnitude. In addition, this degree of accuracy appears to be robust to sensor remounting and wrist position and is reasonably accurate across all fingers and joint angles. To implement NeuroPose, the authors trained the ML model on each user using a desktop and then loaded the generated model onto a smartphone device using TensorFlowLite. The authors also profiled both the latency and power consumption of the wearable system, finding the processing latency to be only 0.101 s and power consumption to be 40 mW for the encoderdecoder with ResNets model. The previous three results suggest that the algorithm can be implemented for real-time processing on a mobile device for a full day of constant use. As such, the previous wearable system provides an excellent example of a wearable, noninvasive, and real-time device with high accuracy in finger gesture differentiation in three dimensions.
The extraction of both muscle activity and joint rotation is of significant interest in rehabilitation but is often only accomplished through systems restricted to laboratory settings (optical motion tracking and wireless sEMG measurements). Cotton and Rogers [178] developed a flexible, wearable sEMG system capable of monitoring both joint angle and muscle activity in real-time through the use of sEMG (2 kHz), a nine-axis inertial measurement sensor recording acceleration, rotation, and magnetic data (500 Hz), and an accompanying application on an Android platform connected through Bluetooth. The methodology for the pose estimation algorithm implemented by Cotton et al. is beyond the scope of this review, but its use enables the estimation of rotation angle of a given joint. In addition, Cotton et al. simultaneously recorded sEMG activity from the target muscle and sent all preprocessed data to a phone via Bluetooth. As the authors also noted, this system can be ideal for at-home rehabilitative monitoring due in part to the modularity of this system-these sensors could be placed on any major joint in the body and multiple simultaneously, making it a more generalizable tool-and the wearability of the system-the device itself is fabricated on a 0.1 mm thick flexible circuit board, with the electrodes mechanically decoupled from the central device with serpentine wires, thereby making it robust to twisting, stretching, and rotation. However, the system poses two challenges: pose estimation errors due to errors in magnetic calibration or external magnetic fields and establishing a reference sensor position. The authors also note both challenges could be resolved through modifications to the smartphone-based applicationssuch as a calibration routine and correcting for magnetic field distortions.
A large obstacle to gesture recognition algorithms is the incorporation of various states of multiple joints that comprise a single gesture. For example, a "thumbs-up" gesture involves ulnar deviation of the wrist and flexion and supination of the elbow in addition to the finger positions. A "thumbs-down" gesture, in contrast, would differ from the "thumbs-up" gesture only in the elbow state-flexion and pronation, not supination. In 2021, Chen et al. [131] proposed an sEMG system using CNN and CNN-LSTM networks to accurately classify 30 hand gestures, composed of variations to the finger, elbow, and wrist joints, while reducing training time by a full order of magnitude through the use of transfer learning-the use of optimizations learned from previous training rounds to reduce the training time for subsequent models. More specifically, two 48-channel sEMG arrays are attached to the forearm, and two 16-channel sEMG arrays to the biceps are sampled at 1 kHz, with the data being passed through a deep learning network. A CNN and CNN-LSTM network were separately trained and measured by their accuracy of gesture prediction and the number of gesture repetitions required to improve the model accuracy to an acceptable level (>90%). Finally, transfer learning was used on the CNN and CNN-LSTM networks to determine if doing so improved the accuracy or number of gesture repetitions required to improve model accuracy. In both models, the use of transfer learning significantly reduced the number of gesture repetitions required to improve model accuracy to 90%, thereby also reducing the time required to train the model. Since low generalization of models and heavy training burden are key factors inhibiting the adoption of muscle tracking technology, the work by Chen et al. represents a key advancement to reducing the calibration time required for muscle trackers using deep learning pipelines.

NIR Spectroscopy
Generally, NIRS is used to determine local hemodynamics, often in the form of tissue oxygenation, making it inherently more difficult to obtain real-time information. However, McIntosh et al. [158] developed a wearable bracelet using NIRS that characterizes changes in the structure of the wrist-through the reflectance and TX of NIR light-and an MLP to classify the data into 1 of 12 discrete gestures. Important to note also is that these gestures operate in multiple degrees of freedom, with multiple joints; many gestures involved flexion/extension and adduction of the wrist, and others involved more minute gestures with the fingers. Using their proposed model, McIntosh et al. achieved a 93.3% classification accuracy. In addition, because they relied on an infrared system, the device was found to be more resistant to variability in pressure; lighting the entire array off the skin by 1 mm, or wearing a bracelet on top of clear latex, did not significantly affect the accuracy. This observation is especially important when considering both EMGs and Ultrasounds are sensitive to environmental perturbations. Finally, the authors also demonstrated the possibility of calibrating the device using an "open palm position" by using a neural network regressor to extrapolate the orientation of the device based upon absorption measurements. As such, the device also seems relatively insensitive to small shifts in sensor alignment.
The ease of adoption of many devices is often dictated by how easily it can be incorporated into existing infrastructure. Today, many wrist-worn devices utilize PPG to estimate heart rate and pulse oximetry. Zhao et al. [151] utilized three commonly found sensors on wrist-worn devices-two green LED-photodiode pair (PPG sensor), an accelerometer, and a gyroscope-in conjunction with either a CNN (ResNet) classifier and gradient-boosted-tree (GBT) classifier. In both cases, when all sensors were utilized, the precision and recall in the classification of nine sign language gestures surpassed 95% with adequate training. In addition, when only the PPG sensors were used, the average precision and recall were still over 89%. The computational latencies induced by GBT and ResNet classifiers were 0.651 and 0.601 s, respectively, where processing was performed using an Android app on a Samsung Galaxy Note 5, suggesting the system is both wearable and accurate in finger gesture differentiation and online. In addition, the researchers found their prototype can run for up to 6.1 h on a smartwatch alone, and with computational processing offset to a phone-using Bluetooth to transmit the data-the power consumption of the device decreases further allowing it to run over 24 h on a standard smartwatch. As such, the previous device presents a significant step in the adoption of gesture recognition using wearable hardware already commercially available.
The next logical step to gesture recognition is prosthetic control. In 2017, Guo et al. [179] built a wearable NIRS-sEMG system for both offline gesture classification and online virtual prosthetic control. The researchers tested both LDA and SVM to classify gestures and added the following performance metrics for real-time applications: selection time-the time required to produce the first correct prediction-completion time-the time needed to correctly guess 10 gestures-completion rate-the percentage of completed motions within 5 s-and real-time accuracy-the prediction accuracy from the first correct prediction to the end of the task. Using either the LDA or SVM classifiers, the offline classification accuracy in able-bodied subjects surpassed 97% when using the combined NIRS-sEMG system, and in amputee subjects, the average accuracy was around 87%. In the online performance scenario, all metrics were improved-both in average and variance-by using the combined system. The selection time and completion time were the lowest with the combined modality-0.27 and 1.29 s, respectively-as were the completion rate and real-time accuracy-94% and 90%, respectively. The low selection and completion times, combined with the high completion rate and real-time accuracy, make the intuitive, online, control of prosthesis much more feasible using this system.

Ultrasound
The ability to correlate muscle activity to muscle output force is essential to the intuitive control of assistive devices. However, few noninvasive methods exist to reliably accomplish this. sEMGs are often used to accomplish such tasks but often have difficulty with spatial localization of muscle activation. However, sEMGs are often preferred due to both their high temporal resolution and ability to detect isometric contractions. However, Hallock et al. [180] developed an Ultrasound-based system utilizing the standard iterative Lucas-Kanade method of optical flow estimation to track muscle thickness and correlate it with force output. While the system was not wearable, its capability to correlate muscle thickness to muscle output force for isometric contractions is particularly unique in the field of ultrasound development and is of particular interest for rehabilitative purposes. In addition, the greater spatial specificity offered by ultrasound might allow for the estimation of muscle output force for distinct muscles at a higher spatial resolution than sEMGs.
The online detection of gestures has also proven to be a difficult task in the past. Lu et al. [1] designed a wearable real-time gesture recognition scheme based on four-channel A-mode ultrasound and data processing using PCA and gesture classification using LDA and SVM. The researchers acquired data from 10 subjects, each performing ten common gestures for a total of 10 rounds. In the offline analysis, the system achieved a 96.92% average recognition accuracy. In the online, or real-time, experiment the system on average achieved a 96% motion completion rate and a recognition accuracy of 83.8%. While the recognized gestures involved primarily flexion/extension and abduction/ adduction of the fingers alone, this work still represents a significant advancement in the online detection of gestures using an entirely wearable, online system.
For prosthetic control, a wearable muscle tracker must provide muscle contraction information in real-time to prevent a lag between the inception and actuation of a movement. However, using ultrasounds to track morphological deformation of muscles can prove computationally expensive-owing to the high spatial and temporal resolution data acquired from M-mode ultrasounds-thus preventing real-time applications. In 2021, Yang et al. [181] developed a wearable multichannel A-mode ultrasound system for multiperspective muscle contraction detection. In addition, they tested the ability to detect and perform (through a virtual prosthetic) 3-degree-of-freedom movements. Across three subjects, eight distinct movements, and four trials per movement, the average successful completion rate was 100%, the path efficiency 93.3%, and the completion time 15.61 s. In contrast, Ortiz-Catalán et al. [182] performed a 3-DoF TAC test using sEMG signals and multilayer perceptron, achieving a task completion rate of 93.1%, path efficiency of 55.8%, and task completion time of 7.3 s. As such, the previous work represents a promising step in the field of prosthetic control through noninvasive, wearable devices.

Conclusion
This review aims to reveal the landscape of this emerging field of muscle tracking with a number of applications that pave the way for the development of seamless human-machine interfaces. For example, using a 128-electrode sEMG system, Chen et al. [131] were able to accurately differentiate 30 gestures-a feat few other applications achieved. Similarly, Lu et al. [1] designed a wearable A-mode ultrasound device to differentiate 10 gestures in realtime, making it one of the few wearable ultrasound sensors with applications for real-time gesture recognition. In addition, Shi et al. [159] made use of both sEMG and ultrasound imaging to estimate torque generation from skeletal muscle activity, prompting further exploration into how combinations of sensing modalities may open new avenues for muscle tracking.
Even outside of the realm of skeletal muscle imaging, researchers have made impressive use of sensing modalities and algorithms that show significant promise for muscle tracking. Ma et al., through the use of deep learning, were able to improve the spatial resolution of NIRS-derived images postacquisition, thus providing a mechanism to upsample input data, thereby potentially reducing data acquisition time significantly. [137] Abibullaev et al., [157] using NIRS to measure cerebral hemodynamics, were able to accurately classify four distinct brain activities based on measured NIR absorbance from 19 channels, suggesting potential applications to gesture recognition using NIRS.
However, a number of obstacles remain to be overcome to further facilitate the practical implementation of muscle-tracking technologies. The first is creating algorithms that functionally run in real-time, meaning the time taken to process data is low enough to cause no perceivable delay to the user. A number of variables can affect processing time for a given algorithm: onchip processing power, memory requirements, power supply, and communication latency. The latter refers specifically to the computing latency introduced by where the processing takes place. In on-chip processing, the majority of the latency is introduced by the transferring of data from memory to the processing unit and back, as well as the algorithmic operations themselves. In the case of central-station processing, where the wearable device transmits data to a central processing unit (often via Bluetooth), additional latency is introduced by the wireless TX of data. Finally, in the case of cloud-computing, a final layer of latency is introduced by infrastructure between the wearable device and the server. Most of the reported online wearables appear to use either on-chip or central-station-based processing, likely due to the higher latency introduced by cloud-based computing. It should be noted that, generally, both latency and computational power increase in the shift from on-chip to central-based and central-based to cloud processing. Power consumption may also be an important consideration in the field of wearable systems; Zhao et al. [151] demonstrated that the battery life of the same wearable system for NIRS increased when processing was offloaded to a central station (an Android phone) via Bluetooth. As such, the choice in processing station can be understood as a tradeoff between computational power, latency, and power consumption. In addition, the algorithms themselves can place different demands on the system. DNNs, for example, tend to require more memory due to the larger number of mutable parameters than some ML algorithms. However, as we demonstrated in our Applications section, there are a number of examples of various algorithms-time-frequency domain, machine learning, and deep learning-being employed for online data processing on wearable systems. In addition, as on-chip processing capabilities increase, the algorithms available for online applications will also increase. It should be noted, however, that while certain applications have shown real-time applications, often they require either a perceivable processing time to complete analysis or suffer from reduced accuracy or classification variety in exchange for faster processing.
The second obstacle is the ability to estimate muscle activity without muscle length changes. The reason for this is because patients with significant muscle weakness may attempt movements without succeeding in moving the limb, effectively resulting in an isometric contraction. Shi et al.'s [159] EMG-ultrasound sensor succeeds in estimating torque generation from isometric contractions, making it an excellent step in solving this problem. The third and final obstacle is the ability to differentiate complex movements. Complex movements are those that operate in multiple degrees of freedom, often utilizing superficial and deep layers of muscles simultaneously, as well as those involving movement around multiple joints. While many applications discussed previously tackle the earlier issues, to our knowledge, few of them tackle all three issues simultaneously while also being wearable, making this an area to be explored further.
While our review focused on sensor modalities that can elucidate internal muscle activity, there are a number of modalities used to extract kinematic information, thus providing an indirect estimate of muscle activity. Cotton and Rogers, [183] for example, developed a wearable sEMG system that also leveraged a nineaxis inertial measurement sensor to measure not only muscle activity but also rotation. In addition, Si et al. (2022) [184] published a comprehensive overview of flexible strain sensors utilized in wearable hand gesture recognition, detailing recent advances in material construction and algorithm applications for extracting kinematic information. Given the advancements made in this field, combining modalities that extract internal muscle activity and kinematic information could advance the field of muscle tracking even further.
Taking inspiration from current research-be it combining multiple sensing modalities or making use of novel algorithms-will allow us to overcome the aforementioned obstacles and in turn move the field of muscle-tracking toward true human-machine interfaces. The clinical implications of this are massive when considering the prevalence of conditions that necessitate such devices, and the resource cost of current means of treatment, making this a worthy area to focus the efforts of future research.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.