Control and Autonomy of Microrobots: Recent Progress and Perspective

After decades of development, microrobots have exhibited great application potential in the biomedical field, such as minimally invasive surgery, drug delivery, and bio‐sensing. Compared with conventional medical robotic systems, microrobots may be capable of reaching more narrow and vulnerable regions in the human body with minimal damage. However, limited by the small scale of microrobots, microprocessors, power supplies, and sensors can hardly be integrated on‐board. Thus, new strategies for the actuation and feedback for microrobots need to be explored. Furthermore, the open‐loop control method accomplished by operators may lack accuracy, and long‐duration operation could bring a severe physical challenge in many applications. Consequently, the automatic control of microrobots with the aid of control theories is developed to improve the control efficiency and precision. To further promote the automation level of microrobots, machine learning algorithms are expected to provide a solution to let microrobots adapt to more dynamic environments and undertake more complex medical tasks. Herein, a systematic introduction of the manipulation of microrobots from open‐loop to closed‐loop control is given in this review. It is envisioned that microrobots will play an important role in future biomedical applications.

To date, various methods have been proposed to actuate microrobots, such as bio-hybrid actuation, chemical fuel, electrical field, light, acoustic wave, and magnetic field. [29][30][31][32][33][34][35] Furthermore, multiple stimuli can be employed simultaneously to achieve actuation and precise navigation of microrobots. [36] By adjusting the control inputs of the external stimuli, parameters like speed, moving direction, and geometric shape of the robot can be tuned accordingly.
To accomplish some practical medical applications (e.g., active delivery for localized therapy), precise motion control could be a required factor, which demands a highly effective feedback method. For most in vitro application scenarios, direct optical vision-based feedback could be efficient for providing information about the microrobot including position, gesture, and moving speed. [37] However, vision-based feedback alone may not be able to provide enough contrast between microrobots and the background when disturbances exist in working environments; also, this method is hard to be employed for some in vivo or ex vivo applications when microrobots are covered by biotissues or immersed in untransparent bio-fluid. To promote the application potential of microrobots in diverse scenarios, various advanced imaging technologies have been explored, such as ultrasound (US) imaging, [38,39] fluorescent imaging (FI), [24,40] and magnetic localization. [41][42][43] Combining the actuation input and the feedback information, manipulation of microrobots with high precision is possible. And if the task is accomplished by an operator, namely human-in-the-loop, it is referred to as openloop control. Though the open-loop strategy does not demand complicated control algorithms, which simplifies the overall control system, the manipulation performance of open-loop control highly relies on the experience of the operators. Moreover, the concentration of the operators may decay during the long-time operating process. To tackle these issues, a closed-loop control scheme is of great research value. The locomotion information of microrobots could be obtained from the feedback images with the aid of computer image processing technologies. Control theories [44,45] provide promising solutions for the automatic control of microrobots, and together with planning algorithms more practical applications like targeted navigation can be achieved. [46,47] Moreover, with intelligent algorithms like convolutional neural network (CNN) [48] and reinforcement learning, [49] microrobots possess the potential of performing higher level automatic behaviors. The overall workflow for automatic control of microrobots is shown in Figure 1.
This review aims to introduce the recent development of the open-loop control and automatic control of microrobots. To begin with, the mainstream actuation methods and imaging strategies are introduced, as the control input part and the feedback part of the classic closed-loop control system. Second, the closed-loop control applications with different automation levels and different agent numbers are introduced. Subsequently, machine learning algorithms are presented, assisting microrobots in undertaking more complex and more diverse tasks. At last, the review is ended with conclusions and outlooks.

Actuation Strategies of Microrobots
Depending on the fabrication materials, structures, and working environments, microrobots can be powered by various mechanisms, including bio-hybrid actuation, electrical field, acoustic wave, light radiation, and magnetic field. In addition to being actuated by a single power source, locomotion governed by multiple stimuli possesses compound advantages. The detailed categories, application scenarios, advantages, and limitations of different actuation methods have been comprehensively summarized by many researchers. Readers can refer to review articles [2,34,[50][51][52][53][54][55][56][57][58][59][60] and corresponding research articles given in Table 1 for more technical content.

Localization Strategies of Microrobots
The tracking methods of microrobots consist of image-based methods and sensor-based methods. The image-based methods include optical imaging, FI, US imaging, magnetic resonance imaging (MRI), magnetic particle imaging (MPI), etc. Image processing techniques can be utilized to extract information of microrobots, such as position, moving speed, and posture. Alternatively, sensor-based tracking methods like magnetic localization and radiofrequency positioning provide the position and gesture information according to the sensor reading results with the aid of decoding algorithms. In Table 2, the spatiotemporal resolution, localization dimension, and representative references for different methods are summarized.

Optical Imaging
For most in vitro applications, in open and visible environments, visual feedback by cameras or optical microscopies can provide sufficient information about the microrobots. [61] Image processing methods like frame differencing [62] and shape fitting [63] are applicable for the tracking and pattern recognization of microrobots. Researchers have also used a triangular positioning algorithm to obtain the 3D position of a microrobot ( Figure 2a). [64] Furthermore, to enhance the imaging contrast and distinguish different microrobot individuals, 3D printing was applied to fabricate color-expressing nanostructures on microrobots. [65] Under the radiation of visible light, the microrobots will provide vivid color expressions for tracking.
However, for most ex vivo and in vivo scenarios, microrobots in organisms are not directly visible; besides, when microrobots are immersed in some biofluid, such as whole blood, visionbased feedback strategy by camera or optical microscopy is not applicable, either. Thus, other imaging methods suitable for more complex application situations need to be explored.

FI
FI technique has been a powerful tool for the examination of histological tissue or living cells in biological research. Under the excitation of light with specific wavelengths, fluorescent probes including quantum dots, organic dyes, and biological organisms can exhibit fluorescence in different ranges. [66] Benefiting from the advantages of high contrast, high sensitivity, molecular resolution, and low cost, FI has become one of the most popular imaging strategies in the biomedical field. [67,68] The spatial and temporal resolution is related to the applied fluorescent microscopy. It could reach a submicrometer scale and 200 ms. The high resolution and high imaging framerate properties make FI suitable for real-time manipulation of microrobots. Using probes with better fluorescent properties can enhance the contrast of FI. The main limitation of FI is the shallow penetration depth, which deteriorates its biocompatibility. Y. Zhang et al. presented a remote detection method of C.diff toxin. In this research, FI was employed to track the G. lucidum spore functionalized by carbon dots, and the decay of fluorescence strength reflected the toxicity of the stool specimen ( Figure 2b). [24]

US Imaging and Photoacoustic Imaging
As a real-time, harmless, well-established imaging technique, the US plays a vital role in medical imaging research. [69] The temporal resolution of US imaging basically depends on the traveling speed of the US wave. Typically, the frame rate of US imaging can reach 100 Hz. The spatial resolution of US imaging contains lateral resolution and axial resolution. The lateral resolution is decided by the geometry of the US probe, and the axial resolution (typically %500 μm) is related to the frequency of the sound wave. With a higher frequency, US imaging gives a better axial resolution, yet the imaging depth will be relatively shallow since highfrequency US waves are easier to be attenuated. The imaging contrast is related to the acoustic resistance difference between the target microrobot and the surrounding environment, thus by increasing the acoustic resistance of the microrobot (e.g., air Table 1. Representative research articles about the actuation of microrobots using different power sources.
Owing to the various operation modes (B mode and Doppler mode), this method can be applied in different scenarios. [70] Sitti and co-workers developed a magnetic soft-bodied microrobot with multimodal locomotion. When operated in ex vivo environments (like chicken tissue), the position and gesture of the soft robot were obtained by B mode US imaging. [71] E. Niedert et al. applied B mode US imaging to localize a tumbling microrobot in a porcine colon, demonstrating the biocompatibility of US imaging (Figure 2c). [72] Also, a US probe can be integrated into a mobile parallel coil system and enable large-scale tracking of a microswimmer with high robustness. [73] In addition to B mode, US can also be operated in Doppler mode, which is able to recognize the motion status of objects based on the Doppler effect. [74,75] Zhang's group reported a strategy to navigate s magnetic microrobot swarm in blood flow using Doppler  [64] Copyright 2020, IEEE. Panel (b): Reproduced with permission, [24] Copyright 2019, AAAS. Panel (c): Reproduced with permission, [72] Copyright 2018, MDPI. Panel (d): Reproduced with permission, [74] Copyright 2021, AAAS. Panel (e): Reproduced with permission, [87] Copyright 2021, AAAS. Panel (f ): Reproduced with permission, [90] Copyright 2019, IEEE. Panel (g): Reproduced with permission, [93] Copyright 2015, IEEE. (h): Reproduced with permission, [94] Copyright 2013, IEEE.
mode. The local flow speed difference induced by the rotating magnetic microrobot swarm was tracked by US, and subsequently, the position of the microrobot swarm was located ( Figure 2d). [74] Instead of addition to the US probe, the US signal can also be generated under the excitation of laser pulses, known as the photoacoustic (PA) effect. [76] The spatial resolution of photoacoustic imaging (PAI) is typically hundreds of microns, which is determined by the ultrasound detection array and the wavelength of the applied light. By reducing the wavelength of the light and increasing the detecting bandwidth of the light-ultrasound transducer, PAI provides a higher spatial resolution. In real-time tracking research, PAI could provide feedback images in a framerate of 2-10 Hz. The imaging contrast of PAI depends on the optical property difference between tissues. Z. Wu et al. developed a type of micromotor capsules for in vivo targeted drug releasing. [77] The migration of the capsules was monitored by PA computed tomography in the gastrointestinal tract. Xie and co-workers reported a polydopamine-coated magnetic Spirulina swimmer for treatment of in vivo bacterial infection, PA imaging (PAI) was adopted for the tracking of the swimmers. [78] Instead of being applied separately, a dual-mode imaging method combining USA and PAI was proposed by D. Xu for the localization of liquid metal nanobots. [79]

MRI and MPI
MRI is based on the excitation and relaxation of Hydrogen atoms in the human body. According to the obtained map of atom repartition, the image of tissues is reconstructed. [80] MRI possesses the advantages like no ionizing radiation and better contrast for biological tissue. The spatial resolution of MRI could be hundreds of microns. However, the temporal resolution is relatively low. It could take several minutes or longer to acquire an image. The spatial resolution can be enhanced by reducing the field of vision, and there is a trade-off between spatial and temporal resolution. With a longer sampling time, comes a better spatial resolution and an inferior temporal resolution. The compromise between the two resolutions is the main challenge for MRI in real-time applications. D. Folio et al. utilized an MRI system for the steering and actuation of ferromagnetic microrobots. [81] Also, swarms of nanoparticles and bacteria can be localized using MRI for micro manipulation tasks. [82] MPI directly detects the iron oxide particle tracers and constructs 3D structures with exceptional contrast and temporal resolution. [83] The spatial resolution of MPI is at a millimeter scale. By increasing the intensity of the magnetic field gradient or fabricating the microrobot using paramagnetic materials with higher susceptibility, the spatial resolution will be improved. The temporal resolution ranges from hundreds of milliseconds to several seconds. Although MPI strategy has been widely applied to localize miniature magnetic objects, the main limitation of MPI is that the spatial resolution still needs improvement. K. Bente et al. presented a selective actuation scheme for millimeter-scale active magnetic matter clouds. [84] The 3D model for the clouds was reconstructed using MPI. Bakenecker's group achieved magnetic actuation and imaging for the navigation of a magnetic microrobot in a human cerebral aneurysm phantom via an MPI scanner. [85] Tai and co-workers combined MPI and magnetic hyperthermia to build an image-guided theranostic platform, [86] which tackled the issues of accumulating tracer particles in off-target organs and prevented the heat damage to organs owing to the localization difficulty.

Radionuclide Imaging
Radionuclide imaging (RI) is a type of imaging technique based on the introduction of exogenous agents-radionuclides. RI offers advantages including high sensitivity and no penetration limitation. RI contains γ scintigraphy and emission computed tomography (ECT). ECT can be further categorized to positron type (PET) and single-photon type (SPECT). H. Hortelao et al. presented in vivo monitoring of the enzyme-powered nanomotors in a mouse bladder. The imaging was achieved by PET combined with computer tomography (PET-CT). [87] The tracking ability of PET-CT using a large population of self-propelled microrobots was demonstrated by D. Vilela. [88] The spatial resolution of PET-CT can be less than 1 millimeter. However, it could take more than one minute to acquire an image. Compared to other ionizing radiation-based methods (e.g., X-Ray), the radiation dose of PET-CT is lower, yet the large acquisition time hinders its further application in real-time tasks.

Optical Coherent Tomography
Optical coherent tomography (OCT) imaging provides a higher imaging resolution compared to imaging modalities like US imaging. Furthermore, OCT imaging has good compatibility with magnetic actuation methods. Optical coherence tomography (OCT) possesses high spatial resolution (%10 μm), and the high imaging framerate (%100 Hz) enables the OCT technique for real-time imaging and manipulation tasks. However, compared to imaging methods like US and MRI, the penetration depth of OCT is shallower (about 1 mm), and makes it unsuitable for some in vivo applications. Z. Wu et al. for the first time developed a type of helical micro propellers that could penetrate through the vitreous humor and reach the retina, and OCT was utilized as the imaging scheme. [89] The in vivo tracking performance of OCT was furtherly validated by Li. [90]

Magnetic Localization
Magnetic localization has gained attention from more and more researchers in recent years. Magnetic field can penetrate through biological tissues without significant attenuation. Thus, magnetic localization is utilized widely in biomedical applications, especially for tracking wireless capsule endoscope (WCE). [91,92] There are two commonly used categories of magnetic localization methods: the first one places permanent magnets inside the capsule, and the magnetic field is measured by the external sensors; the second one imbeds magnetic sensors inside the capsule, and the position of the capsule is obtained according to the sensor reading results. The spatial resolution of magnetic localization is related to many factors: the model accuracy of the magnetic field distribution of microrobot, the performance and number of Hall sensors, and external magnetic field disturbance.
The temporal resolution mainly depends on the algorithm for position decoding. Also, an advanced algorithm can improve the localization accuracy. As depicted in Figure 2g, a 2D Hall-effect sensor array was fabricated, and a 5D localization of an untethered magnetic robot under an external driving magnetic field was achieved. [93] K. M. Pope et al. embedded Hall-effect sensors in one magnetic capsule, of which the reading results can be wirelessly sent to the host PC for localization calculation. [41] Under control of a permanent magnet source, the six-degree-of-freedom (DOF) and rotating propulsion were realized simultaneously.

Radiofrequency Localization
Radiofrequency (RF) localization mainly focuses on analyzing high-frequency electromagnetic signals to obtain the locations and postures of microrobots ( Figure 2h). [94,95] RF-based strategy possesses the advantages of low hardware cost and high applicability. Similar to magnetic localization, the spatial resolution of the RF-based method is related to the accuracy of RF waves and the performance of the receivers; the temporal resolution is determined by the data transmission speed and decoding algorithms. Differently, RF localization processes the high-frequency electromagnetic waves to localize the microrobot, which enables it to effectively eliminate the disturbance of external magnetic fields. Researchers presented a one-stage tissue-adaptive RF-based method and achieved precise localization of WCE ( Figure 2f ). [96] Combined with other localization techniques, the positioning accuracy of RF localization can be enhanced. G. Bao et al. presented a hybrid strategy that combined vision imaging via the visual sensor embedded in the capsule and RF localization. [97] According to the experimental results, the tracking error of the combined method was attenuated to 2.3 cm, compared to which the result of the RF method is 6.8 cm on average.

Closed-Loop Control of a Single Microrobot
To establish an effective, precise closed-loop control system, several key points are indispensable: controllable inputs, accurate feedback information, and high-performance control algorithms. The overall closed-loop control work flow of microrobots is depicted in (Figure 3). In Section 2, we have already introduced the mainstream of actuation and tracking methods for microrobots. However, unlike the traditional controlled objects like motors, drones, and electromagnets, closed-loop control of microrobots may face some challenges. At first, microrobots have different structure designs, fabricated materials, and different locomotion mechanisms, [35,[98][99][100] which means the corresponding dynamics need to be studied and modeled separately; then, at micro/nanoscale, the external disturbances and dynamic uncertainties may have a greater influence on the motion of microrobots. [101][102][103] Consequently, more in-depth study of the advanced control algorithms to overcome the external disturbance and system uncertainty are crucial.

Proportional-Integral-Derivative (PID) Control
PID has been one of the most powerful control laws since it was proposed in the 1920s-1940s. [104] After the development of almost one century, it has been widely used in the industrial field. PID controller does not demand an accurate mathematical model for the controlled system, and the control mechanism is straightforward. These advantages make the PID controller an ideal candidate for microrobot manipulation research. L. Yang et al. proposed a model-free control scheme for a two-particle magnetic microrobot system ( Figure 4a). [44] A proportionalintegral (PI) controller was established for the position control, and an extended state observer (ESO) was adopted for the disturbance observation and compensation. The locomotion error of the microrobot can be restrained within 10 μm. Except for surface rollers, closed-loop control of microrobots with other movement mechanisms (e.g., helical-shaped swimmer, soft-bodied swimmer, [64] and field gradient-actuated micromagnets [105] ) was also accomplished with PI controller and PID controller. Pawashe's group achieved visual servoing of a Mag-μBot using a PI controller for multimodal micro-manipulation application. [106] In ex vivo scenario, the PID controller has exhibited good performance. Li and co-workers achieved visual-based closed-loop steering of a microrobot in a zebrafish embryo with high precision. [99] PID controller is also widely used in guidewire systems. Researchers have proposed the inserting regulation [107] and bending direction error attenuation [22] of guidewires using PID. Magnetotactic bacterium (MTB) could perform controllable migration according to the excitation of external magnetic field. Khalil and co-workers achieved closed-loop control of MTB. [108] The bacterium tended to align itself along with the field direction, and the actuation was governed by its flagella bundle and magnetic force. A PD controller was adopted for the Figure 3. The overall diagram for closed-loop control of microrobot with path planning. The reference waypoints are provided by the path planning algorithm, according to the reference states and the feedback information, the calculated control inputs are delivered to the actuation system, and the microrobot will accomplish the given task. closed-loop control, and auxiliary input was responsible to decrease the velocity of MTB for higher positioning accuracy.

Nonlinear Control
Practically, a microrobot is a complicated system with high nonlinearity and uncertainty. The unmodelled dynamics, Brownian motion, [109][110][111] external fluid force, [101,112] and inaccuracy of the actuation system [113] all could distort the controllability of microrobots, making classic linear controllers like PID inadequate for some control tasks. Thus, more advanced nonlinear control strategies need to be proposed. For in vivo scenarios, multiple types of disturbance including drag force, buoyancy, and biofluid flow may destroy the stable control of microrobots. Adaptive control is suitable for systems without accurate mathematic models, researchers have applied adaptive controller for microparticle manipualtion [114] and cell tracking control (Figure 4b) [115] against external disturbances and unknown dynamics. When a system has large parameter perturbation and uncertainties, a well-designed robust controller could guarantee a desirable performance. In micro robotic field, robust control has been applied based on input-to-state stability (ISS) theory, [116] sliding mode and backstepping control, [112] and H ∞ control [117] to deal with disturbances (e.g., hydrodynamic drag force, contact force, weight force, and van der Waals force) and uncertainties existing in working environments. In addition to adaptive control and robust control, researchers have also proposed other nonlinear controllers for the accurate manipulation of microrobots. L. Zheng et al.
presented an actuation system capable of navigating a microrobot in 3D space. A prescribed performance controller was proposed for the motion control and an observer was responsible for disturbance compensation. [118] L. Arcese et al. used a nonlinear backstepping controller combined with a high-gain observer for magnetic particle manipulation in the cardiovascular system to suppress the wall effect at the vessel bifurcations. [102] Dong and co-workers proposed a closed-loop control scheme for Caenorhabditis elegans using an optogenetic actuation strategy. [119] The navigation was based on a predictive proportional controller and could achieve micrometer-scale precision. A mean value theorem (MVT) observer-based backstepping controller was presented to navigate the microrobot in a cylindrical vessel environment. [103] The microrobot could be navigated to move along the reference path smoothly subjected to external nonlinear disturbance, especially to the hydrodynamic drag force. J. Liu et al. proposed a proxy-based sliding mode control strategy and achieved 3D stable control of a helical microswimmer. [120] Ryan's group developed a magnetic actuation system, consisting of eight permanents, each was capable to rotate along a fixed axis. [121] A nonlinear path following algorithm was adopted to drive the microrobot to the next waypoint while perpendicularly approaching the path.

Optimal Control
Optimization-based control algorithms are able to guarantee the system output tracking to the reference signal with minimum input efforts. With disturbance and uncertainties, Reproduced with permission, [44] Copyright 2018, IEEE. Panel (b): Reproduced with permission, [115] Copyright 2016, IEEE. Panel (c): Reproduced with permission, [128] Copyright 2012, IEEE. Panel (d): Reproduced with permission, [130] Copyright 2018, SAGE.
www.advancedsciencenews.com www.advintellsyst.com optimization-based strategies can also perform closed-loop manipulation of microrobots with high precision. Model predictive control (MPC) is a classic optimal control scheme. The control inputs are generated by minimizing a cost function for future H steps. Many researchers have employed MPC scheme on the holonomic system [122,123] and nonholonomic system. [124] When disturbances (e.g., pulsatile blood flow) exist, MPC could also provide a stable control performance. [125,126] Linear quadratic regulation (LQR) algorithm is also utilized for the manipulation of helical microrobot [62] and spheroidal magnetic microrobot. [127] An optimal minimum variance controller was proposed by Z. Zhang et al. and accomplished steering of a microrobot in the 3D workspace( Figure 4c). [128] 3

.4. Linearization of Nonholonomic Systems
For microrobots with nonholonomic locomotion mechanisms, the system dynamics are nonlinear and strongly coupled, which brings difficulties to the controller design. To date, numerous research studies have been carried out on simplifying system models. C. Samson proposed a method that linearizes a nonlinear system to a chain form, and designed a simple and effective controller for a two-wheel robotic system. [129] This method was also utilized for the manipulation of helical-shaped microswimmer. [130,131] The dynamics of a helical swimmer could be expressed in the Serret-Frenet frame, considering the weight and lateral disturbance (Figure 4d). [130] The obtained kinematic model was linearized using chain form. The 3D closed-loop tracking experiments illustrated the robustness and accuracy of the proposed algorithm. T. Xu et al. achieved a planar path following a scaled-up helical microswimmer. [131] The state model for the swimmer in the horizontal plane was rewritten as a linear system with three states and two inputs, and the desired orientation was obtained accordingly. Similar to the helical swimmer, the guidewire system is also a nonholonomic system. Hong and co-workers proposed a bicycle model to describe the kinematics of the guidewire, and chained-form transformation was employed for the linearization of model and feedback controller design ( Figure 4d). [132] The control schemes for the closed-loop control of microrobots are summarized in Table 3.

Automatic Control of Microrobots with Path Planning Algorithms
In practical biomedical applications, complicated and dynamic obstacles may exist in the working environments. Although progress has been made on highly precise closed-loop control of microrobots, simple locomotion under human commands along manually designed trajectories in free space may not be sufficient for undertaking some biomedical tasks (e.g., targeted delivery). Thus, the autonomous, obstacle-free, targeted navigation methods in complicated surroundings for microrobots are essential.

Iteration-/searching-Based Planning Methods
One of the most popular solutions to address this issue is to combine closed-loop control methods and global path planning algorithms. Various types of path planning algorithms like searching based methods A* (Figure 5a), [133,134] rapid-exploring random tree (RRT) (Figure 5b), [47] optimization-based methods particle swarm optimization (PSO) (Figure 5c), [46,122,135] and their modified versions [120,136] are adopted for microrobot planning tasks. L. Yang et al. proposed an automatic obstacle-avoidance navigation scheme based on the PSO algorithm ( Figure 5c). [122] The same group also employed the PSO algorithm for obstacle-avoidance manipulation of a magnetic droplet swarm. [46] A Pareto optimality-based framework was proposed for path planning and automatic navigation in a complex vascular network. [81] Path generation and multi-objects optimization could be achieved at the same time in this work. Furthermore, path planning and microrobot navigation in a 3D environment have also been explored. J. Liu et al. utilized the RRT*-Connect algorithm to generate a collision-free path for a helical magnetic microswaimer. [120] Compared to other RRT-based algorithms, the adopted strategy could obtain an optimal path with the shortest length.

Global Planning Methods with Local Refine
One major drawback of the methods based on predefined reference paths is that the obtained waypoint sequence may be vulnerable to a slight change of the dynamic environment.
On the one hand, in real application environments like biological organs, which are elastic and dynamic, the waypoints planned based on the previous environment may enter the obstacles during the navigation process; on the other hand, the feasibility of the path planned based on the global environment may suffer from the low-resolution of the feedback image, and the waypoints may be unreachable for the microrobot. To date, researchers have proposed various approaches to increase the robustness of the planning algorithms. A hybrid global-local path planning scheme was presented by Yang and co-workers ( Figure 5d). [136] The global reference path was generated by filtering the RRT* algorithm, and the local reference waypoints around the microrobot were refined using a field potential-based algorithm. Z. Yang et al. used a switching scheme to guarantee the robustness of the navigation. When the microrobot was closer to the obstacle than the threshold, the planning method would be switched from A* to fuzzy logic (Figure 5e). [123] Also, another switching-based AI planner which contained three algorithms for different planning scenarios was presented by Li and co-workers. [137] Meng's group proposed a motion planning strategy to navigate microrobots in a cardiovascular environment. [112] The global path was planned using a breadth-first search algorithm, and an A* algorithm was used to generate free-collision waypoints based on the specific geometry of the vessel. Ju's group utilized the RRT algorithm for path planning for microrobot-aided manipulation of cells. [138] To deal with the slight diffusion of obstacle cells, the path would be updated if the interface may occur. L. Zheng et al. achieved 3D navigation of a magnetic microrobot with the reconstructed 3D obstacle map. The global path was planned using an enhanced RRT (ERRT) algorithm. And a local planning scheme was integrated for the dynamic obstacle avoiding (Figure 5f ). [118]

Field Potential-Based Methods
Most planning and navigation strategies discussed earlier are using searching or iteration principles. The planning results are generated based on the channel and obstacle distribution before navigating the microrobot. This mechanism indicates that these planning methods are adequate for static situations or environments with slight changes. When dynamic obstacles exist in the working environment, a field-potential-based planning algorithm can be utilized for online collision-free navigation. By respectively adding virtual attractive and repulsive force fields to the target point and obstacles, the microrobot performs tendency movement towards the target point, while avoiding the obstacles (Figure 5h). [139][140][141] This online planning scheme may fall into a local low potential point in a narrow, multi-branch environment [137] ; yet when dynamic and limited number of obstacles appear surrounding the microrobot, this method can provide effective collision-free motion instruction without taking up too much calculation source. Researchers presented a planning algorithm for bacteria-powered microrobots to realize dynamic obstacle avoidance (Figure 5g). [142] The proposed method was based on the combination of the dynamic window approach and vector field histogram. X. Li et al. demonstrated a strategy named collision-avoidance vector. [143] The feasibility of the proposed scheme was validated via in vivo experiments for targeted transportation of cells. Helical-shaped swimmer 2D [131] Magnetic guidewire 2D [132] www.advancedsciencenews.com www.advintellsyst.com Figure 5. Targeted navigation of microrobots with path planning. a) A* planning method. b) RRT planning method. c) Particle swarm optimization (PSO) planning method. d) Rapid-exploring random tree (RRT) planning method with local refine. e) A* planning method with fuzzy-logic local refine. f ) Dynamic enhanced RRT (ERRT) planning method. g) Vector-based method for dynamic path planning. h) Potential-based path planning method. Panel (a): Reproduced with permission, [133] Copyright 2018, IEEE. Panel (b): Reproduced with permission, [47] Copyright 2017, IEEE. Panel (c): Reproduced with permission, [122] Copyright 2019, IEEE. Panel (d): Reproduced with permission, [136] Copyright 2020, IEEE. Panel (e): Reproduced with permission, [123] Copyright 2021, IEEE/ASME. Panel (f ): Reproduced with permission, [118] Copyright 2021, IEEE. Panel (g): Reproduced with permission, [142] Copyright 2017, PLOS. Panel (h): Reproduced with permission, [139] Copyright 2021, IEEE.

Obtain the Planning Environment Using Medical Imaging Modalities
The planning for the reference trajectory demands prior knowledge of the working environment, including the accessible area and obstacle distribution. For biomedical application scenarios, medical imaging methods can be adopted to obtain the internal environment of patients. Z. Yang realized targeted navigation of a guidewire in a vascular phantom. [107] They conducted global scanning of the phantom using US, and the centerline of the phantom was reconstructed according to the scanning result. K. Belharet et al. utilized magnetic resonance angiogram (MRA) imaging to obtain the vessel distribution of patient, then the reference trajectory could be generated from the maximum intensity projection (MIP) from the MRA image. [101] Folio's group utilized an MRI setup to scan the vascular phantom and obtained the internal environment for the planning. [81] 5. Multiagent Control of Microrobots

Independent Control of Multiple Microrobot Agents
Due to the small scale of microrobot, the task one single agent can finish is highly limited for some applications (e.g., targeted therapy and micromanipulation). One possible solution to this issue could be the simultaneous control of multiple microrobot agents. As for traditional large-scale robotic systems, with onboard microprocessors and communication modules integrated, the mutual communication between different agents and wireless independent control from the host computer can provide various strategies for multiagent control. However, according to the limitations of microrobots discussed in Section 1, those strategies are hard to apply. To date, several strategies have been reported for the independent control of microrobots: independent control via heterogeneous designs of microrobots, independent control via nonuniform control input, selective control of multiple microrobots, independent control exploiting the interaction forces between microrobots.

Independent Control via Heterogeneous Designs of Microrobots
For microrobots under the control of a uniform external stimulus, heterogeneous designs of different agents generate different behaviors, which can be exploited for independent control of microrobots. For example, when microrobots are designed with different geometric shapes (e.g., different helical shapes (Figure 6a) [144] and different sizes [145] ), the identical control input could lead to different speeds and opposite moving directions. By appropriately designing the control input sequence, multiple microrobots could move along different trajectories. For magnetic microrobots, the magnetic property difference can also be exploited for independent control. A. W. Mahoney et al. focused on the different step-out frequencies of helical microswimmers with different magnetic volumes (Figure 6b), [146] U. Cheang et al. exploited multimodal motion of microswimmers with different magnetic properties. [147] 3D independent manipulation of multiple microrobots was first achieved by Diller et al. [148] . Researchers proposed various designs of microrobots, with different geometric shapes and scales, each can respond uniquely to a common driving magnetic field. By combining a platform composed of six magnetic coils, 3D position control of multiple microrobots was achieved with high precision.

Independent Control via Nonuniform Control Input
Except for utilizing the heterogeneity of microrobots, independent control can also be realized by employing nonuniform external control input. Magnetic field gradient has been widely explored for actuating magnetic microrobots. With appropriately designed magnetic field gradient distribution, independent control of multiple microrobots could be achieved. M. Yousef et al. built a magnetic actuation platform with four rotating permanent magnets. [149] Angles of the magnets were modulated by a PI controller. With the analytically designed magnetic field, the positions of two identical magnetic microrobots in the horizontal plane were controlled simultaneously.
Other than permanent magnets, electromagnetic coils can also be utilized for simultaneous multi objects manipulation. D. Wong et al. presented a method using four stationary electromagnetic coils to realize identical magnetic microrobots manipulation in the 2D plane (Figure 6c). [150] Denasi and co-workers achieved independent control of multiple microrobots using magnetic gradients produced by an electromagnetic system (Figure 6d). [151] A leader-follower controller and independent controller were proposed for coordinated motion control and independent motion control. Mellal et al. used optimal control theory to independently and robustly control multiple magnetic spherical microrobots. By decomposing the system into a controllable subsystem and an uncontrollable subsystem, the positions of multiple microrobots and the velocity of one agent are controllable. [152] For the 3D manipulation situation, F. Ongaro et al. designed and assembled a novel electromagnetic system. [153] By exploiting the inhomogeneity of the magnetic field generated by this system, the independent control of two identical and nonidentical microrobots was accomplished, respectively. At the center of the workspace, 3D manipulation of two microrobots along different reference paths was achieved with an average root mean square error of 102 μm.

Selective Control of Multiple Microrobots
Selective control is another solution to the individual manipulation of microrobots. By employing a platform that is capable of generating local desired control stimulus, or selectively anchoring the specific agents, independent manipulation of microrobots can be achieved.
Printed circuit board (PCB) is a technique widely used almost everywhere in the modern industrial area. Coils and wires are integrated on-board with high density and precision. Local magnetic field can be generated around the electrified wires, and by selectively powering different coils and wires, the global magnetic field distribution is programmed and configured. Researchers have achieved multi-robot manipulation using www.advancedsciencenews.com www.advintellsyst.com PCBs with planner coil array (Figure 6e) [154] and multiple-layer wire array [155,156] that each coil and wire could be powered individually.
Optical tools possess great directivity and low dispersion, which endow optical actuated microrobots with exceptional selectivity. The thermocapillary forces (Figure 6f ) [157][158][159] and self-thermophoresis phenomenon [109] induced by the temperature gradient generated by laser radiation have been applied for the independent control of microbubbles and selfthermophoretic microswimmer microparticles. OT is a powerful tool to trap micro-objects with high precision. The mechanism of locomotion control using OT is straightforward and research about collision-free navigation of multiple microrobots was proposed by A. Banerjee and co-workers. [160] With specialized boundaries capable of anchoring specific microrobot, independent manipulation of microrobot agents can be realized using a global input. A selective method has been presented for remote actuation of identical microscrews. [161] Several magnetic screws were operated using a uniform rotating magnetic field, and the selectivity was accomplished by using a strong field gradient to lock all screws that were not expected to move. E. Diller et al. proposed selective control of shell-based microrobots on an electrostatic grid surface (Figure 6g). [162] Each grid on the substrate can be activated individually to anchor the robots, and the unanchored agents could be navigated using a global magnetic field. Another research about selective control of microrobots utilized a thermally responsive clamper to anchor microrobots, while the selected agent was navigated for tasks like micromanipulation and assembly. [163] Shahrokhi's group achieved anchoring microrobots via a tank with a nonslip boundary. [164] Microrobots could move freely inside the workspace and be trapped by the walls due to the rough boundaries.

Independent Control Exploiting the Interaction Forces between Microrobots
For the aforementioned studies, the interforces between different microrobot agents are either ignored due to the relatively far distance from each other or avoided by specially choosing the materials of the microrobots. However, for some types of microrobots (e.g., magnetic microrobots and charged particles), when the agents are close enough the interforce may not be negligible. In contrast, the interforce between microrobots could provide another method for independent control. M. Salehizadeh et al. presented a 3D independent manipulation method for magnetic microrobots (Figure 6h). [165,166] In a uniform magnetic field, the magnetic moment of the microrobot was assumed to align with the field direction instantly. Thus, under a strong magnetic field, the relative direction between microrobots was modulated, and subsequently, the inter-agent force and relative position were controllable. According to the superposition principle, an extra gradient field can pull the microrobots as an entity. Therefore, the independent manipulation of microrobots utilizing interagent force was achieved. Moreover, the controlled agents could be replaced with controllable magnetic grippers, and applications like independent targeted cargo delivery were conducted. Mellal et al. controlled the inter-agent force using the combination of magnetic gradient control and oscillatory motion in the microfluidic channel. [167] Sadelli et al. proposed to study the local controllability of a pair of magnetic microrobots for improved controllability. The authors used a backstepping controller to locally stabilize the nonlinear system. [168]

Control of Microrobot Swarms
For the actuation of multiple microrobots, one method is to control different individuals independently, which is introduced in Section 5.1. The other way is to actuate and control multiple microrobots as an entity, which is also known as a microrobot swarm. This method focuses on the cooperation of microrobots.
Swarm-like behavior is ubiquitous in nature, like birds forming swarms to utilize the airflow for long-time flight. Cooperation between different animal individuals can improve the survival rate and accomplish more work. Inspired by this concept, the formation and actuation of microrobot swarms have been explored in recent years. Under the equilibrium of external stimuli and interaction between agents, thousands or even millions of microrobots aggregate together to form a specific pattern and perform some swarm behaviors. Various types of microrobot swarms triggered by acoustic wave, [169] electric field, [170] light, [171] and magnetic field [61,[172][173][174][175] have been reported. In addition to artificially designed microrobot agents, microorganisms like MTB can also assemble to form swarms under the guidance of a magnetic field. [176][177][178] The control of microrobot swarms falls into four parts: the formation of microrobot swarms, the reconfiguration and deformation of swarms, locomotion of swarms, and independent control of multiple swarms.

The Formation of Microrobot Swarms
The first step for swarm control is to gather multiple microrobot agents and form a stable swarm. According to the formation mechanisms, swarms can be triggered by convergent fields or the dynamic equilibrium between the building blocks of swarms. For the first category, swarms actuated by convergent forces like magnetic gradient force, acoustic pressure, and fluidic force can simply be formed by aggregating microrobot agents at the minimal potential energy position. [179][180][181][182][183][184][185] As for swarms formed by the dynamic equilibrium between the inter-agent forces of building blocks, automatic formation control of microrobot swarm via the existing microrobot agents in the working environment could increase the formation efficiency and make full use of the building blocks for microrobot swarm. The building blocks can be at millimeter-scale, micrometer-scale, or even smaller. For agents distinguishable to naked eyes or microscopy, an orderly gathering scheme of microrobots could be employed. Genetic algorithm was adopted to generate the optimal path passing through all the microrobot agents as the gathering order, swarms consisting of magnetic droplets (Figure 7a) [46] and peanut-shaped hematite colloidal particles (Figure 7b) [186] were obtained with high efficiency. As for swarm formed by nanoscale particles, of which the building blocks could be indistinguishable due to the resolution limit of the imaging tool. Also, millions of microparticles in the field of vision are hard to plan and trap in order. Thus, a statistics-based swarm control scheme was proposed (Figure 7c). [63] Under the excitation of a rotating magnetic field, www.advancedsciencenews.com www.advintellsyst.com Figure 7. Automatic control of microrobot swarms. a) Automatic formation of magnetic droplet swarm using genetic algorithm. b) Automatic formation of peanut-shaped microrobot swarm using genetic algorithm. c) Gathering improvement control of a microswarm using the statistic-based method. d) Deformation of a ribbon-like microswarm. e) Deformation of a vortex-like swarm. f ) Automatic shape control of a microswarm using fuzzy logic. g) Closed-loop control of a microswarm under USA imaging guidance. h) Targeted locomotion of a magnetic microrobot swarm. Panel (a): Reproduced with permission, [46] Copyright 2021, IEEE. Panel (b): Reproduced with permission, [186] Copyright 2019, IEEE/ASME. Panel (c): Reproduced with permission, [63] Copyright 2019, IEEE. Panel (d): Reproduced with permission, [61] Copyright 2018, Nature. Panel (e): Reproduced with permission, [194] Copyright 2018, SAGE. Panel (f ): Reproduced with permission, [190] Copyright 2020, IEEE. Panel (g): Reproduced with permission, [45] Copyright 2020, IEEE. Panel (h): Reproduced with permission, [193] Copyright 2020, IEEE. Panel (i): Reproduced with permission, [196] Copyright 2021, ACS. Panel (j): Reproduced with permission, [179] Copyright 2020, Mary Ann Liebert, Inc.
www.advancedsciencenews.com www.advintellsyst.com Fe 3 O 4 particles tended to form a vortex-like swarm, with its parameters (e.g., shape ratio, pointing angle, moving direction, and locomotion speed) controllable. By processing the feedback grayscale image, the particle aggregations were extracted, and the particle utilization rate of the largest aggregation (swarm candidate) was obtained. A gathering improvement algorithm was also included to increase the particle utilization rate and keep the swarm stable at the same time.

Reconfiguration and Deformation of Swarms
Different from a single microrobot, a swarm is a dynamic equilibrious microrobot collective, of which the deformation and reconfiguration endow the swarm with better adaptivity and flexibility. Passive deformation can be achieved via direct contact and extrusion, to pass through some narrow channels. [187] This deformation method exploits the cohesion and elasticity of the swarm, demands no special control input and algorithm. However, to improve the adaptability of the swarm for more complicated working environments, active controllable deformation and reconfiguration are more worthy of studying. [188,189] Z. Zhou et al. applied dual-frequency acoustic signal on an aluminum substrate, the nodal point at the interaction of the two nodal lines could attract and trap particles to form a swarm. [179] By adjusting the ratio between the amplitude of the two acoustic signals, the swarm would elongate or contract accordingly. A swarm driven by a programmable convergent field can exhibit more complicated swarm shapes. X. Dong and M. Sitti established a 2D microrobot swarm controlling method. [181] A 2D magnet array was placed below the swarm, by programming the magnet distribution and polar configuration, the geometric shape can be precisely modulated in the horizontal plane. Magnetic microrobot swarms triggered by oscillating field and rotating field exhibit a strong correlation between the shape ratio of the swarm and magnetic field parameters (Figure 7d,e). [38,61,172] By tuning the field parameters, the swarm can generate on-demand transformation. The automatic pattern control and reconfiguration of magnetic nanoparticle swarms were first proposed by L. Dong and co-workers. [190] Researchers presented a fuzzy-logic-based control method to automatically control the swarm deformation (Figure 7f ). [190,191] The fuzzy logic combined the open-loop fitting result as feedforward input and human experience, the control block is shown in Figure 8. Compared to the PID scheme, fuzzy control overcame the strong nonlinearity and hysteresis of the deformation process. According to the experimental results, the proposed control scheme realized automatic deformation control stably with high response speed.

Locomotion of Swarms
Precise locomotion is the key point for automatic control of microrobot swarms in applications such as cargo delivery and target therapy. The automatic control of a single microrobot agent and multiple agents has been introduced in previous sections. Similarly, effective propulsion method and feedback strategy also enable controllable navigation in swarm situations. When a nonuniform external field is applied, the swarms can be directly pulled by the convergent point as an entity. Acoustic field Figure 8. The architecture of the fuzzy-logic controller for microrobot swarm control. The actuation system is a lab-made 3D Helmholtz coil system, powered by commercial amplifiers. The swarm shape ratio was obtained by imaging processing and ellipse fitting. With the given reference including swarm orientation, pattern and position, the corresponding control inputs were calculated via a fuzzy-logic controller. Combined with the feedforward control inputs from the model fitting results, the swarm could perform accurate controllable swarm behavior. Panel: Reproduced with permission, [191] Copyright 2021, IEEE.
www.advancedsciencenews.com www.advintellsyst.com can be utilized to trigger the generation and migration of microrobot swarm. Also, researchers exploited vibration tweezer to trap microparticles. [179] By moving the nodal point of the acoustic field as waypoints along the predefined trajectory, the swarm could be automatically navigated to the target position. Light has been explored to drive photoactive micro-objects in recent years. The thermal effect of light radiation is an ideal candidate to generate temperature gradient, resulting in convection flows inside the liquid medium. When changing the illumination spot, the swarm would migrate along with the actuation light accordingly, indicating a stable and controllable locomotion behavior. [192] Magnetic microrobots tend to move to a position with a higher field strength in gradient field. The swarm consisting of multiple magnetic droplets could be navigated by moving the magnetic needle, and the swarm would be attracted by the field gradient and follow the path of the needle. [46] A magnetic tweezer device containing five poles was developed as depicted in Figure 7h. [193] By moving the movable pole, the position with the maximum magnetic strength would be relocated, leading to the locomotion of the microrobot swarm. Since MTB could migrate along the external field direction, the swarming and locomotion of MTB can be induced by a nonuniform magnetic field. D. Loghin et al. presented a platform consisting of four coils. [176] Via the proposed platform researchers achieved the swarm formation and displacement control for targeted drug delivery. S. Martel et al. utilized MTB to achieve micro-manipulation and built a pyramid-like structure using SU-8 blocks. As a passive field, it is impossible to construct a static magnetic field converging to one point in 3D space. A time-varied control scheme was demonstrated to overcome this defect. [177] According to the field sequence provided in this work, the field could converge to a specific point when integrated over time. Magnetic torque actuated swarms can be propelled by the friction force asymmetry induced by the pitch angle of the field plane. [61,194,195] By tuning the yaw angle and pitch angle of the field, the moving direction and speed of the swarm can be adjusted accordingly. Researchers proposed an automatic control scheme for magnetic-torque actuated microrobot swarm. [63] A linear-quadratic integration (LQI) controller was employed for closed-loop navigation of the swarm of magnetic nanorobots. For ex vivo scenario, Q. Wang et al. applied B-mode US as an imaging scheme for closed-loop control of a microrobot swarm in a chicken tissue-covered channel (Figure 7g). [45] The localization was finished by the frame differencing method based on the feedback images from US. The closed-loop algorithm was a PID controller. The advantages and challenges of different types of swarms for automatic control and further application are given in Table 4.

Independent Control of Swarms
Similar to multiple microrobot agents, swarms can also be independently controlled via exploiting the anisotropy of the system. A swarm pattern independent control scheme was presented by Du and co-workers using nanoparticles and nanorods as swarm building blocks, respectively ( Figure 7i). [196] The shape ratio of nanoparticle swarm pattern performed a monotonic positive correlation with the field ratio. [61] As for the nanorod swarm, a two-region pattern character was observed. In the anomalous region, the shape ratio exhibited a negative correlation with the field ratio. On the contrary, the shape ratio would increase or decrease along with the field ratio in the normal region. Under this mechanism, independent swarm pattern modulation can be achieved with the same uniform magnetic field. The independent locomotion control of acoustic wave-actuated microrobot swarms was accomplished by generating two local convergent fields using a vibration tweezer. The swarms were trapped and actuated independently, and could be navigated along different trajectories (Figure 7j). [179] Table 4. Comparison between microrobot swarms actuated by different power sources.

Driving power Advantages Challenges
Magnetic field Multimodal swarms can be triggered using different types of magnetic fields.
The formation and stability of the swarm will be easily affected by the substrate and surrounding liquid environment.
Turning the parameters of the field leads to stable active pattern reconfiguration and deformation.
The locomotion control can be relatively complicated.
Magnetic field is transparent and relatively safe to biological tissues, making it possible for in vivo application.

Light
Swarm could exhibit locomotion following the light irradiating area, the control scheme is simple and straightforward.
The working environment and trigger source are highly restrained, which make light-driven swarms not suitable for in vivo application owing to the penetration ability of light.
Active and controllable swarm pattern reconfiguration is difficult to realize.

Acoustic wave
The formation mechanism is pressure gradient based, which provides a stable and compact swarm pattern without particle loss.
The formation of standing waves highly relies on the working environment.
The locomotion of the swarm is based on the attraction of the nodal point, the motility depends on the calculation of node position.
Electric field High controllability, formation, and locomotion can respond quickly according to external stimulus.
Low biocompatibility. the applications are restrained to in vitro environment.
The building blocks are limited to dielectric particles.

Microrobot Control Assisted by Machine Learning
How to improve the automatic and intelligent level of robots is the key point of robotic research. In microrobotic field, there are two possible solutions. The first one is to exploit the instinct properties of advanced materials [197][198][199] (physical intelligence), and the second one is to implement advanced algorithms on the control system (computational intelligence). Here we mainly focus on computational intelligence, indicating the integration of advanced control algorithms and microrobotic systems. For the systems that imitate humans to finish some mental activities, such as understanding language, driving in a real street, doing mathematics, and writing programs, are considered to possess some degree of artificial intelligence (AI). [200] The conventional control methods are basically based on the rules made by researchers, the system just needs to strictly act following the given command. However, as for AI, the essential idea is to enable the system to learn the control rules automatically. Machine learning algorithms have a great fitting, summarizing, and classifying abilities, after the training of adequate and effective datasets, the system can generate optimal decisions for practical applications.

Model Fitting and Control Optimization
To date, lots of research utilizing machine learning algorithms in microrobotic applications, including kinetic model fitting, [62,120] gesture recognization, [48] and gait optimization [201,202] has been conducted. A neural network can be regarded as a black-box system. With the training of appropriate dataset, neural networks exhibit great nonlinear fitting and input-to-output mapping ability. [203] The dynamic of microrobot could be extremely complicated with uncertainties and disturbances. By introducing artificial neural networks and machine learning algorithms, the controller design can be simplified. Radial basis function (RBF) network has been applied to model the dynamics and compensate for the locomotion errors of helical-shaped swimmers. [62,120] Sitti's group developed a magnetic soft-bodied microrobot with multimodal locomotion. [71] The modeling and control methods of soft robots on a small scale are highly limited due to the complicated dynamics and fabrication uncertainties. Thus, researchers utilized the Gaussian process (GP) and Bayesian optimization (BO) to establish the kinematic model and optimize the gait stride.

Microrobot Control via Reinforcement Learning
The nature of learning is to interact with the environment and summarize rules according to the past experience. [204] Reinforcement learning is established based on this idea, by letting the agent interact with the environment and record rewards for every step, the system maximizes the value for the optimal action according to the environment state. The iteration of the learning process enables the system to optimize the delayed reward, which makes this algorithm suitable for long-sequence decision-making problems. In recent years, more and more researchers have shown great interest to combine reinforcement learning algorithms with the control of microrobots.
Reinforcement learning can be categorized into value-based methods and policy-based methods. One classical value-based reinforcement learning method is the Q-learning method. A Q-table is established to record the Q value for each action at different states. A modified version of a Q learning called deep Q-Network (DQN) [205] was first proposed by DeepMind, published in 2013. [206] This algorithm utilized a neural network to map the Q-value of each action for the present system state. By iteratively training the system with the past experience, the network can optimize the Q value of the best decision. The performance of this algorithm was validated on Atari 2600 platform. For more than half of the games, the DQN controller can even perform superiorly to human players.
S. Muinos-Landin et al. proposed a navigation scheme based on the Q-learning method for targeted locomotion of a lightdriven microrobot with and without obstacles in the exploring space ( Figure 9a). [109] After training of more than 400 episodes, the controller was able to steer the microrobot to the target position under strong locomotion uncertainties induced by Brownian motion. The working environment is divided into grids, each grid represents a state, and eight actions were defined for each state. The reward was given after each step according to the system state. The Q-table was constantly updated during the navigation process. The update equation is shown where s is the state, a is the action, subscript ' means value for the next state. Qðs, aÞ indicates the Q value for action a at state s, Rðs 0 Þ is the reward for the next state, γ is the reward discount, and α is the learning rate. The main drawback of Q-learning is that the quantity of system states is highly limited. To deal with a system with continuous states, researchers proposed a control scheme based on deep Q-learning to navigate a colloidal microrobot, which moves in on-off locomotion mode (Figure 9c). [111] The input of the network was the local environment around the microrobot, and the proxy coordinate of the target position was also included at the fully connected layer. The microrobot was capable to move toward the target position while generating robust obstacle avoidance after training. Colabres's group focused on navigating smart gravitactic swimmers in complex fluid flows (Figure 9d). [49] To navigate the active microswimmer exploiting underlying flow, researchers proposed a reinforcement learning control scheme. The target for the system is to navigate the particle to the highest altitude position while avoiding being trapped by the flow. Q-learning algorithm was implemented for the particle training, which could enable the particle to learn the optimal strategy to swim upward. Q-learning is a value-based algorithm, the learning result is to generate the action with the highest Q-value for the current states. Another learning scheme is the policy-based algorithm, which directly generates the control policy indicating the possibility of choosing each action at the present state. Compared to value-based algorithms, policy-based algorithms can exhibit better performance to learn random policy and deal with continuous action-space problems. Y. Yang et al. employed a reinforcement learning controller based on an actor-critic network to enable the www.advancedsciencenews.com www.advintellsyst.com targeted locomotion for various types of microrobots (Figure 9b). [110] Three microrobots with different controllable DOF were employed to validate the proposed control scheme. All three types of microrobots performed robust targeted locomotion in a free environment and obstacle environment after training. J. Zheng et al. proposed a deep deterministic policy gradient (DDPG)-based algorithm to hold the attitude of a robotic fish ( Figure 9e). [207] The attitude of the fish and the local fluidic pressure was measured as input by an internal measurement unit and 11 artificial lateral line systems, respectively. After training in the constructed simulation environment, the proposed reinforcement learning-based method could exhibit better dynamic tracking performance and less steady-state error compared to the MPC controller. c) Targeted navigation of a microrobot using deep deterministic policy gradient (DDPG) algorithm. d) Navigation of a microrobot exploiting local fluidic field using DQN algorithm. e) Attitude holding control of a robotic fish using DDPG algorithm. Panel (a): Reproduced with permission, [109] Copyright 2021, Science. Panel (b): Reproduced with permission, [111] Copyright 2020, Wiley. Panel (c): Reproduced with permission, [110] Copyright 2020, Wiley. Panel (d): Reproduced with permission, [49] Copyright 2017, APS. Panel (e): Reproduced with permission, [207] Copyright 2021, IEEE.
www.advancedsciencenews.com www.advintellsyst.com Benefiting from the strong fitting ability, machine learning algorithms have exhibited good performance on control and gesture recognization of microrobots with practical disturbances and uncertainties in the absence of onboard sensors. Its potential for more complicated medical tasks in clinical situations still needs more exploration for future research.

Conclusion and Outlooks
Controllable, multifunctional microrobots are promising to fulfill complicated medical and biological tasks that remain challenging for conventional medical robotic equipment. However, the applications of microrobots are highly limited by their complicated locomotion mechanisms, inaccurate feedback signals, and disturbances existing in clinical environments. This review aims to provide a brief and systemic introduction to the control and autonomy of microrobots. To achieve remote and effective control of the miniature medical devices, different strategies for actuation and localization have been explored and proposed. The closed-loop control of microrobot could be realized by combining control algorithms. For applications in more complicated environments, various path planning algorithms are integrated to accomplish targeted navigation and obstacle avoidance. Also, two categories of multi-agent control methods (e.g., independent control and swarm control, the comparison between these two methods is given in Figure 10) are given. Furthermore, the introduction of machine learning algorithms reveals another possible method for microrobot control to deal with the aforementioned challenges. With the exploration of several decades, significant progress has been made in microrobotic field. However, there still remain several challenges hindering the further development of microrobot, as depicted in Figure 11, which we believe will become the hot spots in the future: 1) In vivo treatment using microrobot. To date, most studies about microrobot are either in vitro to prove a concept or ex vivo to validate the biocompatibility of the proposed method. In vivo environments, especially for human bodies, are highly noisy, unpredictable, and complicated, which may lead to the theoretically effective localization and locomotion method losing the feasibility. Moreover, the safety of the microrobot needs to be considered for in vivo applications. The microrobot should be both functional and nontoxic. Also, after the treatment, the microrobot should be excreted from the human body along with metabolism or degradable within a limited time. Till now, animal experiments have been conducted for several studies, [18,19,77,90] yet the in vivo applications and further clinical treatments in the human body still need more research. 2) Advanced structure design and functionalization of the microrobot. The geometric shape design, mechanical structure design, and functionalization of microrobot essentially serve the practical application. For example, a microrobot with a spring-type compliant structure serves as a force sensor with high resolution, [208] a catheter with a helical-shaped tip is suitable for clots removal via mechanical rotation, [22] a microgripper achieves controllable cargo capture and release, [165,166] Fe 3 O 4 nanoparticle-coated microrobots enable the magnetic actuation and FI, [19,122] a microrobot loaded with drugs can be utilized for target therapy. [209] Thus, for further applications of microrobot such as biopsy and minimally invasive surgery, novel structure and functionalization are essential. 3) Combination of microrobot and artificial intelligence algorithm. From the 1950s when the conception of modern artificial intelligence was first proposed, this field has developed numerous branches and gained significant progress after decades of exploring. However, the integration of AI algorithms and microrobot control is still rather superficial. The highly noisy, complicated, dynamic, and unpredictable working environment of microrobot can give full play to the role of AI algorithms. For future research, more advanced applications, like Figure 10. Multiagent control for microrobots in a static environment. Independent control could be achieved by exploiting the heterogeneity of microrobot agents, the control input, and the surrounding environment. The swarm control treats the swarm as an entity, swarm behaviors including formation, reconfiguration, and locomotion are studied.
www.advancedsciencenews.com www.advintellsyst.com optimal decision making and dynamic environment adaption for microrobot assisted by AI algorithm, are of great value. 4) The integration of control system and imaging modality. Although the actuation system and imaging equipment have developed maturely in recent years, the combination of these two technologies still lacks a successful case. To achieve effective and reliable control of microrobot, a compact and highly integrated system capable of generating accurate control stimuli and processing feedback signals are crucial. Till now, scholars have gained a better understanding of the locomotion mechanisms, control methods, tracking modalities, and autonomy of microrobots after decades of exploration. With more advanced materials, structure designs, control theories, and highly integrated systems, more clinical treatments using microrobots will be carried out in the future. Figure 11. Future challenges of remotely controlled microrobots. Including the in vivo treatment, advanced structure design, and functionalization of microrobot, microrobot control assisted by artificial intelligence, integration of actuation system and imaging equipment.