Computer‐Vision Based Gesture‐Metasurface Interaction System for Beam Manipulation and Wireless Communication

Abstract Hand gesture plays an important role in many circumstances, which is one of the most common interactive methods in daily life, especially for disabled people. Human–machine interaction is another popular research topic to realize direct and efficient control, making machines intelligent and maneuverable. Here, a special human–machine interaction system is proposed and namedas computer‐vision (CV) based gesture‐metasurface interaction (GMI) system, which can be used for both direct beam manipulations and real‐time wireless communications. The GMI system first needs to select its working mode according to the gesture command to determine whether to perform beam manipulations or wireless communications, and then validate the permission for further operation by recognizing unlocking gesture to ensure security. Both beam manipulation and wireless communication functions are validated experimentally, which show that the GMI system can not only realize real‐time switching and remote control of different beams through gesture command, but also communicate with a remote computer in real time by translating the gesture language to text message. The proposed non‐contact GMI system has the advantages of good interactivity, high flexibility, and multiple functions, which can find potential applications in community security, gesture‐command smart home, barrier‐free communications, and so on.

In the pre-trained stage, hand gestures, including numbers, alphabets, punctuations, kaomoji, and some function keys, are trained by VGG16 network.The training platform is equipped with an Intel i9-13900KF 24-core processor, 64GB RAM, a single slice of NVIDIA Geforce RTX 4090 with 24 GB video memory, and the training process is executed on Windows operation system .The hand gestures and their corresponding meanings can be found in Appendix SI at the end of this Supporting Information, where the alphabets and numbers are defined based on ASL standards, while the others are self-defined.
Figure S1 shows the framework of the VGG16 network, which includes image preprocessing, feature extraction and classification.The images need to be resized to fit the network before inputting, as depicted in the black dashed box in Figure S1.After the preprocessing, feature extraction stage is applied, including several convolution and max pooling layers.As the convolution process proceeds, the size of the image gets smaller but the dimension expands, this stage is marked by the red and green dashed boxes in Figure S1.Finally, three fully connected layers and a softmax layer are the last step of this network to finish the prediction and classification, which is described by blue and orange dashed box.For the programmable metasurface design, the individual control of a 2-bit element is relatively complicated.Here, a register-based circuit board has been used in this design due to its merit of IO port-saving and simple wiring.Figure S2a shows the operation principle of a single row of the designed control circuit board.Each row consists of 10 8-bit shift registers in cascade form due to the capability of the cascade of NXP 74HC595D [S1].It should be noted that this shift register chip has a maximum shift frequency of 100 MHz, and that is one of the keys to realize quick response of the whole metasurface.A 3.3V direct current (DC) power supply is connected to the Vcc and MR_bar of each chip.The GND port is connected to the negative electrode of the DC power supply.

Section S2. Design of control circuit
Moreover, STCP and SHCP ports are the release signal and shift signal of each chip, respectively.Q0-Q7 ports are the 8 parallel outputs of each chip, and they are connected to the 8 PIN diodes of a supercell.To achieve cascade form, the most important is to connect the former chip's Q7S port to the latter's DS port.The data stream is input at the DS port of the first shift register with the overall direction from right to left.It is crucial to note that in this case, the current in each IO port located in our commercial development FPGA is not strong enough to drive so many chips, especially for the "SHCP" and "STCP" signals.At the same time, a high-frequency shift signal consumes more power, it also tests the drive capability of this system.To provide enough current strength, 10 IO ports on FPGA are used and gathered for SHCP and STCP signals, respectively, a total of 30 IO ports are occupied on the FPGA to manipulate the active 2-bit programmable metasurface.
Meanwhile, we define the period of SHCP Δtshift is 320 ns, and 10 shift registers in cascade form lead to a complete shift, which is the period of STCP signal (rendering time or refresh time), Δtr=25.6 us.Accordingly, each row of the control circuit controls 2-row meta-atoms, which means for a 20×20 metasurface, 10 such rows are needed.It is worth noting that the coding streams for 10 rows are input simultaneously, that is to say, the total rendering time equals to Δtr. Figure S2b illustrates the time sequence of metasurface when it operates.The shadow region represents the metasurface under rendering while the rainbow region represents that the desired radiation pattern is emitted, where the holding time Δtn is defined by the users.The layout of the circuit board is shown in Figure S3.The circuit board is a 3-layer structure spaced by 2-layer FR4 (h=0.5mm,ε=2.2, δ=0.001) substrates, and 10 data inputs at the right side and other inputs including signal and power are located at four corners.We designed a dual-mode X-band horn as the feeding source of metasurface, as shown in Figure S5a.The simulated and measured results of the reflection coefficient (S11) and far-field radiation patterns are shown in Figures S5b and c, respectively.The results show that S11 is lower than -15 dB within the whole X band from 8-12 GHz, while the realized gain patterns for both E-plane and H-plane are quite symmetrical with an overlap angle range covering from -52° to 52° at 10 GHz.

Section S4. GMI system Unlock process
For a personally used system, an unlock mechanism is necessary for the GMI system.For this purpose, we established a gesture-unlocking interface, and the unlocking process is shown in Figure S6.When we show our hand in front of a webcam, a welcome interface appears on the screen through the color-skin-based detection, as shown in step 1 of Figure S6.Then, we use the finger to control unlock bar button and slide it from the left to right to activate the system.Further, a gesture-based Sudoku unlock comes out in the central of 9 white-circle regions, as shown in step 2 of Figure S6.Only when we can draw the correct unlock pattern by moving the finger, the system will be finally unlocked and controlled by the user, as shown in steps 3 and 4 of Figure S6.The experimental demonstration of unlock process can be seen at the beginning of Supplementary Video S1 and Video S2.

S5.1. The workflow of beam manipulation
Five functionalities were implemented to demonstrate the real-time beam manipulation ability of this system, including pencil-like beam, dual beam, OAM beam, diffusion beam and Bessel beam, which correspond to the gesture commands of "1" to "5", respectively, as shown in Figure S7a.In addition, the pencil-like beam and Bessel beam can be further controlled to remote in real-time by automatically scanning within the range of -60° to 60° if sub-mode 1 is selected, or swinging the finger to control the direction if sub-mode 2 is chosen, and OAM beam can be further controlled to increase or decrease the topological charge in real time by showing "plus" gesture (self-defined gesture "11") and "minus" gesture (self-defined gesture "12"), respectively, whose topological charge can be continuously controlled from -3 to 3 orders.Figure S7b shows the real-time beam direction control User Interface (UI) for pencil-like beam and Bessel beam, in which the green region is used for color skin detection.

S5.2. Experimental results for Beam Manipulation
Figure S8a shows the radiation pattern of pencil-like beam deflection under the gesture remote mode.The beam can be controlled to continuously scan within ±60°, with only about -3 dB gain loss at the largest scanning angle.Figure S8b shows the experimentally verified dual-beam and diffusion beam.Two symmetrical radiation beams at ±30° can be achieved in dual-beam mode, while the radiation beam is randomly scattered in the diffusion beam mode.Figure S8c is the measured phase distribution of the designed OAM beam at a distance of 700 mm away from the metasurface, where the topological charge number varies from -3 to +3, respectively.

S6.1. Workflow of Wireless Communication
Figure S9a shows the workflow of the wireless communication of the GMI system.At the transmitting end, an input gesture image is first recognized by the webcam and then converted to the corresponding RS232 command, which is further sent to FPGA and translated into a coding stream.After that, the coding stream is delivered to the metasurface to generate the corresponding high-gain pencil-like beam (digital code "1") and low-gain diffusion beam (digital code "0"), respectively, which are used as an ON-OFF Key (OOK) modulation.At the receiving end, the modulated signals are received by a horn antenna and decoded by corresponding components, then, the decoded coding sequence is translated to the corresponding text message and displayed at the displayer.As for the GUI design at the receiving displayer, not only the editing-textbox will be displayed, but also a port selection and baud rate are shown in this GUI.An "open" and a "close" buttons are designed to activate and shut down the receiving end with a corresponding hint in "Serial port Statement".Furthermore, when the editing box is full, the "clean message button" can be used to remove all the information received before.

S6.2. Experimental setup for Wireless Communication
Figure S9b shows the detailed experimental setup of the transmitting end, which is composed of an RF signal generator (Keysight E8267D) for feeding the horn antenna, a DC power supply for driving the control circuit, a webcam for capturing the gesture images, and a computer for data processing.Figure S9c shows the detailed experimental setup of the receiving end, which is composed of a horn antenna for receiving signals, a detector (AD8317) for converting the RF signal to a DC analog signal, another DC power supply for driving the detector, a FPGA with an ADC (AD9280 chip) for signal conversion, and a notebook computer for displaying the text message.
It should be noted that the distance between metasurface and receiving horn can be further increased for far-field (more than 6 meters) communication in theory.However, if a longer communication distance is required, a power amplifier (PA) and a low noise amplifier (LNA) should be added before feeding horn and after receiving horn respectively to against the space attenuation.

Section S7. Phase compensation of different-type beams in beam manipulation
For all kinds of beams, such as pencil-like beam, dual beam, OAM beam, diffusion beam and Bessel beam in this work, the same phase compensation is firstly required for metasurface to eliminate the phase difference caused by the feeding horn antenna.Assuming that the distance between the ith element and phase center of the feeding horn antenna is di, the spatial phase delay can be calculated as where k0 is the wave number in free space.
The additional required phase compensations for pencil-like beam, multi-beam, OAM beam, diffusion beam and Bessel beam will be further discussed in the following text, respectively.
Pencil-like beam.Based on reflectarray antenna theory, the required phase compensation at different positions (xi, yi) of metasurface for a pencil-like beam with a deflection angle of (θ, φ) can be given by ( ) where θ and φ represent the elevation angle and azimuth angle in the spherical coordinate system, respectively.
Bessel beam.For the Bessel beam, the phase distribution of metasurface mainly depends on the designed range of depth-of-field vector L, the elevation angle θ and the azimuth angle φ.The main purpose is to find a conical surface phase distribution that satisfies the parameter mentioned above.
Assume that the element lies on the corner of metasurface is R away from the central position (any one of four), the convergence angle δ (angle between S and L) and the angle γ (angle between L and generatrix of phase conical surface) should be determined first through L, R, and (θ, φ).S is the vector that equals to L+R.Based on the Cosine Law where γ is the angle beween the direction (θ, φ) and generatrix.Then, the focal spot vector li is deduced, where the focal spot is contributed by ith element.According to the property of similar where ri represents the ith element position vector.One step further, we can calculate the distance si which equals to |li-ri|, then, the angle between direction (θ, φ) and ri, that is αi.Once αi is determined, ζi (the angle between ri and the projection of ri on phase conical surface) can be calculated.

Multiple beams.
For beams pointing at directions (θ1, φ1), (θ2, φ2),…(θi, φi)… (θn, φn), according to Eq. ( 2), we can get their phase distributions 1 , respectively, and then the complex field distribution σi for ith beam can be written as Based on the field superposition principle, the total phase distribution of metasurface for the multi-beam radiation can be calculated by Here, a dual-beam point at (-30°, 0) and (30°, 0) is shown as the demonstration.
Diffusion beam.In order to achieve the diffusion beam, the random phase distribution is required for the metasurface.In this scheme, the phase of ith element of metasurface is set as where randi([0,3]) means the generation of a random integer from 0 to 3.

OAM beam.
For general form of OAM beam, the required phase distribution of metasurface can be calculated by where l indicates the topological charge.In this scheme, we only considered θ=φ=0°, and l=±3, ±2, ±1 for simplicity.

Section S8. OOK modulation and demodulation in wireless communication
For a wireless communication system, the distance between the receiving horn antenna and metasurface is 2 m, when the metasurface works on high gain pattern and low gain pattern, the signals are received by the receiving horn antenna, and the outputs of the AD8317 module are about 0.55 V and 1.58 V, respectively.Therefore, we set the criteria value in Analog Digital Converter (ADC) as 1V, that is, when the metasurface operates on high gain pattern, the input voltage of ADC is 0.55V (less than 1V criteria value), the ADC outputs digital code "1", otherwise, it outputs digital code "0".Because most of the transmitted gestures are defined based on the ASCII code, the 8-bit binary numbers to the corresponding characters can be directly converted.However, for the other self-defined gestures, one-to-one correspondence between the 8-bit binary codes and meaning of gestures needs to be first established.The receiving end just needs to extract the correct 8-bit sequence, and then the sending message can be recovered.
In the receiving end, the horn keeps sampling receiving signals.In the ASCII code table, each symbol starts with 0 (for example, 01000001 represents A), and no symbol starts with 1, indicating that 0 can be treated as the check code.If the metasurface keeps high gain (code 1) while the system is waiting for the orders, when a symbol is sent by the metasurface (OOK modulation), once the code 0 is firstly shifted as the start bit, we can accurately start to record the symbol we sent.
However, if a diffusion pattern (code 0) is set as the default mode while the system is waiting for orders, it is relatively complex to recover the information, as shown in Figure S10.

Section S9. Response time for the system
The response speed of the system itself, whether operating in beam manipulation or communication are in millisecond level.For beam manipulation mode, the skin-color detection region will take about 9 ms to make a judgment.Once the beam direction is determined, the beam direction command will be sent to the FPGA.This process is determined by the baud rate of the serial port.In our case, it is 115200 bit/s.For an 8-bit command, this costs about 7 us.After that, the corresponding pattern will be loaded to the metasurface which costs about 25.6 us.The total time consumption for beam manipulation is less than 10 ms.
For the communication mode, each gesture represents one alphabet, a simple OOK is applied, the theoretical recovering time for one alphabet is 8 (sampling count, 8-bit ASCII code)×2.048ms(sampling time interval) = 16.384ms.Limited by the experimental computer (Intel i7 9700, 32GB RAM), the response time of the trained network is about 70 ms.The recovering time in the matlab program is 3 ms.In theory, the response speed for each symbol should be less than 90 ms (including command-send time consumption).
In actual, the total response speed is mainly influenced by the gesture capture process.
Considering the operator is unfamiliar with sign languages, we set the camera sampling time to 3 seconds, which can be observed in the supplementary videos, so the total time for each symbol transmitting should be about 3.09 seconds in this design.However, this sampling time can be dramatically reduced by practicing sign languages, and thereby greatly improving the whole response speed of system.

Figure S1 .
Figure S1.The basic framework of VGG16 in training tasks.Due to the fixed input size, all the input images should be resized to the RGB image with size of 224×224×3.

Figure S2 .
Figure S2.Operation principle of the control circuit and timing analysis.a. Data shifts in a single row.b.Timing analysis of metasurface.

Figure S3 .
Figure S3.Layout of the control circuit.10-way coding streams should be input at the right side simultaneously.

Figure S4 .
Figure S4.Fabricated metasurface.a. Top view of prototype.b.Bottom view of prototype.c.Overall view of metasurface and control circuit with their connection.

Figure S5 .
Figure S5.Fabricated X-band horn and its simulated and tested performance.a.The schematic of designed X-band horn and its geometric parameters.b.Simulated and tested reflection coefficient for this designed horn.c.Simulated radiation performance at 10 GHz.

Figure S6 .
Figure S6.Workflow of unlock the system to realize message sending.

Figure S7 .
Figure S7.Workflow of Beam Manipulation.a. Different mode choices for beam manipulation.b.Beam direction control UI for pencil beam and Bessel beam.

Figure S8 .
Figure S8.Experimental results for pencil-like beam, dual-beam, diffusion beam and OAM beam.a. Experimental results of pencil-like beam in different direction.b.Experimental results of the dual-beam and diffusion generated by metasurface.c.Experimental near-field phase state of OMA beam in different topological charge number.

Figure S9 .
Figure S9.Workflow of wireless communication and experimental setup for transmitting end and receiving end.a. Operation process of GMI system in wireless communication.b.Equipment arrangement at transmitting end.c.Equipment arrangement at receiving end.

Figure S10 .
Figure S10.The illustration for choosing high gain as the default mode when the webcam is waiting for orders.