Realization of Index Modulation with Intelligent Spatiotemporal Metasurfaces

Advanced wireless communication with high spectrum efficiency and energy efficiency has always fascinated humanity, especially with the explosive increase of global mobile data services. Index modulation (IM) has recently been found to be a promising technique due to the transmission of additional data bits via the indices of the available transmit entities. However, the practical implementation of IM remains a great challenge associated with complicated radio components. Herein, IM with intelligent spatiotemporal metasurfaces is experimentally demonstrated. The spatiotemporal metasurfaces provide a natural and versatile platform to achieve IM in a green and lightweight manner. The whole system is driven by a built‐in inverse‐design agent that automates spatiotemporal metasurfaces to cater to diverse application demands. In doing so, how to mitigate the inherent nonuniqueness issue and how to setup the input target from practical scenes are concretely discussed. In the microwave experiment, the spatiotemporal metasurfaces are fabricated and demonstrate the feasibility by harvesting two harmonic waves as communication channels. An intelligent electromagnetic platform that can manipulate electromagnetic waves in multidimensions is provided, meriting other numerous intelligent meta‐devices that avoid overburdening data analysis networks in smart cities of the future.


Introduction
The pursuit for high data rate is a longstanding topic, especially with the growing global usage of smart devices and services, such as smartphones, autonomous driving, drones, and a broad range of augmented reality and virtual reality applications. It is roughly estimated that the global mobile data traffic will be increased by 88 times from 2020 (57 exabytes/month) to 2030 communication applications, such as OFDM-IM [6,7] and MIMO-OFDM-IM. [8] However, achieving IM in practice faces great obstacles because conventional IM devices necessitate sophisticated and complicated radio devices, limiting its potential for wide applications. [9] Metasurfaces, which are composed of subwavelength scatterers in the planar lattice, have attracted much interest due to the extraordinary ways of engineering light-matter interactions. [10][11][12][13][14][15][16] A myriad of passive metasurfaces have already been demonstrated to impart spatially varying amplitude, phase, and polarization changes on incident electromagnetic (EM) waves. [17][18][19] In the past decades, we have witnessed the discovery of many exotic physical phenomena and the design of optical systems that surpass the performance of conventional diffractive optical elements. [20][21][22][23][24][25] Reconfigurability is one of the most thriving branches that gives rise to active upgrades of already demonstrated passive metasurfaces, which are also usually called as reconfigurable intelligent surfaces (RIS). [26][27][28][29][30][31][32] On this foundation, the temporal dimension has been recently introduced into metasurfaces to enable time-varying or spatiotemporal metasurfaces. [33][34][35][36][37] The space-time duality in Maxwell's equations suggests that, by applying time modulation to the reflection coefficient, permittivity, and surface impedance, time-varying metasurfaces can further expand their impact on the EM waves manipulation in both space and frequency domains. [38][39][40][41] Such introduction breaks time-reversal symmetry and Lorentz reciprocity [42] and evokes many new physical phenomena, including the Doppler effect [43] and time-reversed behavior. These intriguing features inspire us that time-varying metasurfaces may provide a natural platform to experimentally achieve the IM because of the powerful manipulation of numerous harmonic waves. Moreover, it converts sophisticated radio devices into the easy control of time-varying metasurfaces, enabling a lighter, cheaper, and more convenient communication architecture. [44,45] In this work, we experimentally demonstrate the IM concept based on intelligent spatiotemporal metasurfaces. The spatiotemporal metasurfaces are physically achieved by feeding different time-varying signals into each meta-atom incorporated electronic p-i-n diode to dynamically modify the working state. According to Fourier theory, different harmonic waves can be generated and harvested to provide additional bits for the IM. To further intellectualize the IM, we adopt deep learning algorithms to bridge the propagation route of harmonic waves with the time-varying sequences required by spatiotemporal metasurfaces. To mitigate the nonuniqueness issue, [46][47][48] we adopt tandem neural networks where the forward model is attached to realize the fast convergence of the inverse model. In addition, we discuss how to generalize the input of neural networks for the vague target. Compared with the existing spatiotemporal metasurfaces, the pretrained deep learning surrogate provides an "expressway" between metasurfaces and user demands by several orders of magnitude in computing speed. In the experiment, we consider two wireless channels to mimic multiuser scenes and use two harmonic waves to demonstrate the IM. This work substantiates a lightweight hardware processor to manipulate EM waves in the spatial and spectral domain and greatly promotes the development of intelligent metamaterials [49][50][51] and EM smart infrastructures. [52,53] 2. Results

IM Implementationon a Natural Platform-Spatiotemporal Metasurfaces
IM seeks to use the indices of building blocks to convey additional bits. These building blocks include antennas, channels, time slots, code patterns, subcarriers, and more. Compared with conventional techniques like amplitude/ frequency modulation that directly modulates bits on building blocks, these additional bits are implicitly conveyed by different index patterns without additional energy consumption. Thanks to these additional bits, IM-aided system is capable of achieving the same throughput as its conventional counterparts do but with fewer resources, leading to reduced system complexity and less energy consumption. The main criterion to distinguish the existing IM schemes is contingent on which kind of building blocks is used. [2][3][4][5] Subcarrier-IM, also called frequency-domain IM, is a mature scheme and has been investigated for more than one decade. Here, we consider the frequency-domain IM as a demonstration. As shown in Figure 1, by allocating subcarriers f 1 , f 2 , and f 3 with unique index numbers, additional bits of Di can be conveyed for user equipment (UE).The more subcarriers are involved, the more information can be sent out in principle. For example, in Figure 2a, we designate the index numbers 0 and 1 to denote the þ1 st and the þ3 rd subcarrier, respectively. If the þ3 rd subcarrier is activated, additional bits 1 (as written in green) would be demodulated at the receiver. Moreover, subcarrier-IM can be readily incorporated with diverse wireless communications to give rise to spectra efficiency with the same bit errors rate. [5] Rendering subcarrier-IM as a reality remains challenging. Conventional schemes need sophisticated and energyconsuming radio-frequency components to generate multiple frequency subcarriers and manipulate the main beams of each subcarrier to orient to active users. To this goal, spatiotemporal metasurfaces provide a pivotal platform. As illustrated in Figure 1, illuminated by a plane wave with frequency f c , spatiotemporal metasurfaces can generates a multitude of harmonic waves, each of which can be deemed as a subcarrier. These harmonics are invoked by imposing ultrafast timevarying series (periodically varying square waves) into the spatiotemporal metasurfaces. According to Fourier analysis, periodic square waves can be decomposed into the summation of a series of orthogonal sine functions with different angular frequencies; see Note S1 and S2, Supporting Information. In the frequency domain, each sine function is associated with a harmonic wave whose angular frequency equals the angular frequency of the corresponding sine function plus the frequency of incident wave. By carefully designing the timevarying series, the propagation route of harmonic waves can be manipulated at will. Since the harmonic waves are generated and manipulated only by introducing time-varying series, no complex radio devices are involved to allow a cheap, convenient, and low-energy IM executor. Furthermore, assisted with deep learning algorithms, an automatic and unmanned IM can be realized.  Working mechanism and structural design of spatiotemporal metasurfaces. a) Two situations that þ1 st and þ3 rd harmonics act as the carrier to send additional bits to user equipment (UE). b) Experimental reflection spectra of the metasurfaces when the bias voltage of the loaded diode is high (on) and low (off ). The geometries of the metasurfaces are illustrated at the bottom-left inset. c) All equivalent states generated by the spatiotemporal metasurfaces for þ1 st harmonic wave. L represents the length of the time-varying sequences. Here, we consider L equals 2, 4, and 8. d) All equivalent states generated by the spatiotemporal metasurfaces for þ3 rd harmonic frequency.

Structural Design and Working Mechanism of Spatiotemporal Metasurfaces
We consider the spatiotemporal metasurfaces composed of tunable meta-atoms incorporated with electronic p-i-n diodes. Each meta-atom has a square shape patch printed on F4B substrates; the detailed structure is left in Note S1, Supporting Information. Imposed under different bias voltages (high or low), p-i-n diode works between the on and off states to tune the reflection response of metasurfaces. To guarantee the accuracy of the following inverse design, we experimentally measured the reflection data; see Figure 2b and Note S1, Supporting Information. According to the test results in Figure 2b and S1d, Supporting Information, the working band is about 3.48-3.52 GHz, during which the phase difference of ON/OFF states is relatively bigger, as illustrated in Note S1, Supporting Information. Herein, we choose 3.5 GHz as the main frequency. At 3.5 GHz (marked as f 0 ), the reflected phases are about-41.89°a nd 64.97°for on and off states, respectively. The introduction of time-varying series to metasurfaces can be translated into many equivalent reflection responses for harmonic waves. In fact, these equivalent states are virtual working states derived from the Fourier coefficients of the time-varying series, which is conceptualized to facilitate the following inverse design (Note S2, Supporting Information). For instance, for a periodic time-varying series (on-on-off-off-off-off-on-on), the Fourier coefficients will be 0.48e i1.42 for the þ1 st harmonics; hence, the equivalent magnitude and phase will be 0.48 and 76.4°, respectively. To clearly illustrate the relationship between equivalent states and time-varying series, we plot all equivalent states of þ1 st harmonics with L varying from 2 to 8, as shown in Figure 2c, where L is the length of the time-varying series. When L approaches infinity, the number of equivalent states will be infinite. In contrast, if L becomes 0, the spatiotemporal metasurfaces will be degenerated to the basic spatially modulated metasurfaces. The above theory also works for other harmonics illustrated in Figure 2d. Due to the measured unequal amplitudes and phases, the effective reflections show as asymmetric distributions about the original point. Here, we want to emphasize that onetime-varying series can only induce one equivalent state; however, one equivalent state can be induced by more-thanone time-modulated series. With the introduction of equivalent states and time-varying series, sophisticated functions can be achieved using simple metasurface structures, which is because effective states can act as real states in practical scenes. [34][35][36][37][38][39][40][41][42][43][44] In other words, intricate hardware structures are not so necessary.

Inverse Design and Network Training
To automate spatiotemporal metasurfaces, we employ deep learning to build up a mapping between time-varying series and harmonic waves. The criterion of whether a harmonic wave is activated depends on whether the received magnitudes exceed the predefined threshold. In this process, a nonuniqueness issue inevitably exists, indicating that different spatial arrangements of equivalent states may generate the same or highly similar far field. This is difficult for an orthodox deep neural network to converge because conflict training samples will be induced. To address this, a tandem deep neural network composed of the forward and inverse neural network is introduced, whose network architecture is shown in Figure 3a. The forward network is www.advancedsciencenews.com www.advintellsyst.com pretrained using the arrangement of equivalent states as input and far field as output. Once having been well trained, the forward network can precisely predict far-field distribution of a given arrangement. The inverse network aims to predict the spatial arrangement of equivalent states, with the desired far field as input. Hence, a tandem neural network is synthesized by cascading the output of the inverse network to the input of the forward network. Both the output and input of the tandem network are far-field. The loss function is built by calculating the mean absolute errors (MAE) between the input and output fields. During the training process, the tandem network strives to minimize the loss function by adjusting the weights in the inverse network. More details can be found in Note S3, Supporting Information. Once trained, the whole network can generate a target field by designing the spatial arrangement of the equivalent state (extracted from the intermediate layer). Finally, by referring to the relationship between time-varying sequences and equivalent states (Figure 2c,d), time-varying sequences can be recovered from the equivalent state.
To train and test the tandem neural network, training, validation, and test data sets (about 100 000) are collected by randomly generating the spatial arrangement of equivalent states, and corresponding far-field data can be calculated based on antenna theory. We consider one-dimensional (1D) spatiotemporal metasurfaces having 8 Â 8 dimensions; each column has 8 meta-atoms sharing the same bias voltage. To facilitate the network training, we pick up 8 equivalent states with the highest magnitudes for each harmonic, as enclosed by the blue and orange circles in Figure 2c,d, respectively. Each piece of the far-field data is discretized into 181 points by sampling over À90°t o 90°. The training results are illustrated in Figure 3b,c.
The predicted result by the forward network fits the ground truth well. Besides, the whole network can also precisely depict the tendency of the target field, especially for the main beam.

Demonstration of Multiuser Scenario and Preprocessing of Input Target
In practice, many users usually share the same cell. To mimic the multiuser scene, at least two users settling at different positions should be taken into consideration. For simplicity, two users standing against the two sides of spatiotemporal metasurfaces are considered in Figure 4a.To ensure information bits are transmitted to active users safely, main beams usually need to be directed to these users, since main beams carry the most of transmitting energy. Besides, in this work, we take received harmonic intensities as the criteria to decide whether the harmonic works. Hence, main beams of working harmonics are supposed to be steered to the active users to ensure users' received intensity can exceed judgment thresholds. In this regard, the input target needs to be altered unceasingly to make main beams of harmonics self-orient to active users for sending and receiving bits. With the help of intelligent design, this process can be easily accomplished only by feeding the onsite target into the trained network.
Another important ingredient is how to generate input from practical scenarios. This is a nontrivial issue because it directly affects the output result of the neural network. For instance, to shape the far field along 60°, the ideal input field is a delta function (Figure 4b). However, it is almost impossible to be realized in practice. In contrast, even though we feed this delta function www.advancedsciencenews.com www.advintellsyst.com into the pretrained agent, the output field may largely deviate from the anticipation.
To setup a suitable input from ambiguous information, we propose a sinc-function-like field. Sinc-function is a special form of raising cosine functions with alpha being 0. Apart from the beam width, a sinc-function-like field can also provide us with many small sidelobes descending slowly. When alpha becomes bigger, the sidelobes of fields become increasingly unnoticeable, Figure 5. Experimental results. a) Received intensity of the þ1 st harmonic when the receiver is located at AE60°. Orange and yellow bars correspond to the test results with the receiver placed at þ60°and À60°, respectively. b) Specific spectrum for the cases of target 2 and target 3 in a. The þ1 st harmonic is highlighted as the only active subcarrier. c) Received intensity of þ3 rd harmonics. Dark blue and light blue bars correspond to the test results with the receiver placed at þ60°andÀ60°directions, respectively. d) Specific spectrum received for the cases of target 2 and target 3 in c. The þ3 rd harmonic is highlighted as the only active subcarrier. e) Audio signal sent via the þ1 st and þ3 rd harmonics. Some bits are directly transmitted and the rest are implicitly sent via index number.
www.advancedsciencenews.com www.advintellsyst.com as demonstrated in Note S4, Supporting Information. As the transition between delta and sinc-function, we also study a cutoff sin-function-like field that tends to have adequate beam widths. We list the test results using these three kinds of target fields and use Pearson correlation coefficient (PCC)to quantify the performance. As illustrated in Figure 4b, their PCCs are 13.8%, 89.0%, and 91.5%, respectively. This phenomenon can also be explained from a physical perspective. A delta-like field is nearly impossible to be realized in practice because its requirement is too extreme. A cutoff sin-like field permits the existence of adequate beam width without the restriction of one-hot form. However, in practice, beam forming is not that perfect, and sidelobes are usually inevitable. Comparatively, sinc-function, being capable of mimicking imperfect fields with sidelobes, will ease this problem, consistent with the salient filed features.
Having solved the generation of the input target, a customerdesired target field can be generated. As shown in Figure 4c, four target fields are generated and plotted with gray dot-dash lines. Four targets are linked with the situations of two users. 0 indicates the channel at this angle is silent and inactive, which means only bit 0 can be sent to the user standing at this position. On the contrary, 1 indicates that the corresponding channel is activated. Once feeding neural network with those targets, the predicted arrangements of equivalent states can be extracted, the corresponding predicted fields are plotted with the orange lines in Figure 4c. Here we want to notice a point that the output of the forward neural network is continuous while the input of the backward neural network is discrete (eight states), possibly leading to a mismatch. To make it easier, we squeeze the continuous value into one of the eight discretized states by proximity procedure. The fields generated by discrete states arrangements are plotted with green dot lines in Figure 4c. It can be seen that the fields generated by discrete states arrangements do not deviate too much from the predicted fields. By the way, for more precise prediction, L can be enlarged, since the longer the timevarying series is, the more equivalent states can be selected.

Experimental Results
To experimentally validate the intelligent IM, we take the four situations of Figure 4c as a demonstration. The experimental setup and other details are photographed in Note S4, Supporting Information. Amonochromatic plane wave incident on the spatiotemporal metasurface to excite harmonic waves. Receivers are placed at AE60°directions with the same distance to the spatiotemporal metasurface to mimic the two users. All the results are normalized with respect to the measured value when the plane wave impinges a same-sized metallic plate. The four target inputs of the tandem neural network are precisely the four target fields, as shown in Figure 4c.
We plot the received normalized intensities of the þ1 st and þ3 rd subcarrier in Figure 5. The concrete results of target2 and target3 are shown in Figure 5b. Evidently, the measured results match with the predicted results in Figure 4c. When the intensity of a received harmonic exceeds a predefined threshold value, the main beam of the harmonic can be deemed as steered to the users at the corresponding direction and the subcarrier related to the harmonic can be considered as activated.
For þ1 st and þ3 rd harmonics, the selected threshold values are both À5.5 dB, respectively. Obviously, test results are in accord with assumed demands. To be more specific, we assume a segment of digital audio signal 1 100 100 100 110 110 needs to be transmitted, as illustrated in Figure 5e. With the IM method, we can only send part of these bits via traditional modulation methods like amplitude modulation with other bits being implicitly sent by index numbers (written in parentheses) of þ1 st and þ3 rd harmonics. The received signal can then be recovered simply by inserting these IM bits into corresponding positions. Although only two harmonics are used here for demonstration, more harmonics can be involved as demonstrated in Note S5, Supporting Information. In other words, these results can also be generalized to more sophisticated situations with more users and more harmonics involved, opening a new avenue for lightweight, intelligent, and efficient communication.

Conclusion
To conclude, we have demonstrated deep learning enabled spatiotemporal metasurfaces for the IM. Spatiotemporal metasurfaces area natural alternative for the IM implementation because they provide many harmonic waves that match the inherent requirements of the IM. The pretrained neural network agent acts as the driving force of the whole system to help spatiotemporal metasurfaces adapt themselves to diverse communication demands within several milliseconds. Compared with conventional heuristic algorithms, the deep learning method does not need to iteratively search for an optimal answer for each given input or save predetermined spatiotemporal matrixes in advance. The experimental two-channel IM might be generalized to sophisticated situations with more users and subcarriers. In principle, with an elaborately inverse design, countless free wavebands in a small area can be created by allocating harmonics in these bands, thus incorporating numerous additional bits. Our work opens up a new horizon for the design and application of intelligent metasurfaces and offers a deep insight in to the combination of inverse design, spatiotemporal metasurfaces, and wireless communications. We are anticipating this highly interdisciplinary and rapidly expanding field rapidly advances with novel deep learning algorithms and inspire other metamaterial-related applications.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.