A Simple Nonlinear Classifier Using a Multimode Optical Chip

Neural network accelerator based on photonic‐integrated circuits is emerging as a promising technology for fast, power‐efficient, and parallel computing. Under such technology, optics is mainly used for linear transformations, e.g., through an array of cascaded switching networks. Nonlinear activation is implemented either electronically requiring extra optical–electrical conversion or via nonlinear optical materials that often suffer from high loss, large power consumption, and difficulty in integration. Herein, an optical neural chip with only one multimode waveguide, fabricated using low‐cost linear optical materials, plus seven heater electrodes to control the multimode interference, is proposed. The nonlinear networks are intrinsically integrated in the electrical‐to‐optical signal conversion through the waveguide. The linear computation, in the electronic domain, is included in the mandatory step to convert the input matrix to the intermediate current values on the seven electrodes. Though extremely simple, the proposed system can classify nonlinear datasets and images by optical readout with high accuracy and without calibration. Prospects for future development are given at the end. In this work, an alternative route is offered to exploiting the classic multimode interference for advanced optical computing applications.


Introduction
Artificial neural networks (ANNs) are computing systems inspired by the biological neural networks from animal brains.The fundamental task of an ANN is to make predictions based on existing data and has hence been used in a wide range of applications in the fields of autonomous driving, [1] voice translation, [2] image processing, [3] and biomedicine. [4]However, with the fast development of the ANN technology, current electronic computer hardware faces significant challenges in terms of processing speed and power consumption. [5]To overcome these challenges, innovation and disruptive technology in alternative computer hardware are becoming an urgent need. [6]ptical computing has emerged as a promising technology for constructing neural network hardware with inherent parallelism, high speed, and low power consumption. [7]This field have also gathered considerable interest in machine learning applications. [8]Several optical neural network (ONN) architectures have been proposed. [9]Among these techniques, reconfigurable and programmable photonic-integrated circuits (PICs) have sparked substantial interest, thanks to their small footprint, high efficiency, and versatile multifunctional properties. [10,11]The key element of an ONN is the optical neuron, which should ideally perform both linear and nonlinear transformations for the input signals.In many ONN based on PICs, however, optics is mainly used for linear operations, e.g., matrix multiplication.Such linear functions have been realized using programmable Mach-Zehnder interferometer (MZI) arrays, [12,13] micro-loop modulators, [14] and multimode interference (MMI) couplers with nanometer-patterned coupler region. [15]owever, advanced neural networks require nonlinear activation, similar to the function of synapses in the brain's nervous system. [16]The nonlinear function is essential to accelerate the convergence rate of the network as well as to improve the recognition accuracy.For the digital processors, it is straightforward to use an existing mathematical formula (e.g., sigmoid, ReLU, and tanh functions), or design a specific function to carry out the nonlinear operation.However, in the photonic neural network, this is not straightforward to implement.Currently, there are two main solutions.One relies on the electro-optic activation structures. [17,18]Such schemes require effective optical-electrical (OE) conversion, which necessitates the integration of extra photonic components such as photodiodes and modulators, but also extra electronic processing units in between.The other solution relies on the nonlinear optical components.Such ONNs have been demonstrated using phase change materials, [19] electromagnetically induced transparency, [20] and nonlinear optical absorption effect. [21]However, optical nonlinearity typically requires high optical power and therefore significantly lowers the power efficiency of the system. [22]The stability and the integration technology of the advanced nonlinear materials would also require further development.

DOI: 10.1002/adpr.202300253
Neural network accelerator based on photonic-integrated circuits is emerging as a promising technology for fast, power-efficient, and parallel computing.Under such technology, optics is mainly used for linear transformations, e.g., through an array of cascaded switching networks.Nonlinear activation is implemented either electronically requiring extra optical-electrical conversion or via nonlinear optical materials that often suffer from high loss, large power consumption, and difficulty in integration.Herein, an optical neural chip with only one multimode waveguide, fabricated using low-cost linear optical materials, plus seven heater electrodes to control the multimode interference, is proposed.The nonlinear networks are intrinsically integrated in the electrical-to-optical signal conversion through the waveguide.The linear computation, in the electronic domain, is included in the mandatory step to convert the input matrix to the intermediate current values on the seven electrodes.Though extremely simple, the proposed system can classify nonlinear datasets and images by optical readout with high accuracy and without calibration.Prospects for future development are given at the end.In this work, an alternative route is offered to exploiting the classic multimode interference for advanced optical computing applications.
In this work, we propose a nonlinear optical neural chip based on a simple multimode waveguide using only linear optical materials.The device can deliver the classification results by optical readout without any digital nonlinear computation.The nonlinearity is intrinsically introduced in the electrical-to-optical (EO) signal conversion in the waveguide chip.To express the hidden relation between the electrical inputs and the optical outputs through the perturbed MMI effect, an ANN is first constructed.The dataset is collected by the newly developed function programmable waveguide engine (FPWE) to replace the timeconsuming and often error-prone numerical simulations. [23]he system can automatically gather the optical output data in response to the scanning input currents via scripts on a computer.After the multimode neural network (MNN) is trained, a fully connected electronic layer (FCEL) is added in front of the MNN to construct the full neural network (FNN).The FCEL performs only linear matrix multiply-accumulate (MAC) operations.It converts the digital data to a set of current values to drive the thermal electrodes on the multimode waveguide, thereby manipulating the interference pattern for different readout.It also allows weights to be adjusted to train the FNN, after which the classification result is read out by the waveguide port that gives the highest optical power/brightest light spot.

System Architecture and Design Method
The proposed EO-MNN system is described in Figure 1a.The FCEL function is shown in Figure 1b.The optical layer with the thermally tuned multimode optical chip is shown in Figure 1c.The FCEL serves as an interface that converts digital data into a set of current values to drive the thermal electrodes on the multimode waveguide.The waveguide chip works as the complex nonlinear hidden layer and also the output layer.To construct the complete EO-MNN, the multimode waveguide is first studied to express the hidden relation between the electrical inputs and the optical outputs.The 7 Â 1 current array (E) represents the current value applied to the seven microheaters on the multimode waveguide.The 3 Â 1 output array (O) represents the three optical powers from the chip output.
For the computation process, the input signal is initially reshaped into a vector (I) with the size of m Â 1.The FCEL is added to transform the input signal (I) into the 7 Â 1 current array (E).The weight (W FCEL ) and bias (b) have the size of 7 Â m and 7 Â 1, respectively.After that, the current array (E) is fed to the MNN to calculate the output optical power (O).The port with the highest power/brightest spot represents the calculation result.E.g., when O 1 is the maximum output power, the recognition result is Class 1.With MNN as a separate module, the FNN EO-MNN can be established and trained by the standard backpropagation neural network algorithm.In this work, we only train the electric weight (W FCEL ) and bias (b) and leave the MNN as a fixed modular subnetwork.
The design of the multimode optical chip is shown in Figure 1c.The one input and three output waveguides are single mode and placed symmetrically with respect to the center of the multimode region.The MMI waveguide is designed with a width of 70 μm to support the sufficient number of modes and a length of 2500 μm to ensure adequate interference between these modes.This design facilitates a complex nonlinear relation between the electric inputs and optical outputs.Seven microheaters are added on the surface of the top cladding.The heaters are made of gold and have identical structures.The heater length is the same as the multimode waveguide.The number of electrodes (MNN inputs) and output waveguides (MNN outputs) are determined according to a specific neural network task.The size and locations of the electrodes also play an important role in molding the light path via modulated interference.Such waveguide platform has been developed using polymer materials.The polymer materials feature relatively large thermo-optic coefficient and low thermal conductivity at the same time, which can enhance the tuning efficiency of the thermally controlled MMI chip.The detailed design, fabrication, and testing of the polymer waveguides have been covered in our previous works. [24]n this configuration, light from the continuous-wave (CW) laser at constant power is injected to the optical chip as the carrier.The multimode waveguide is the medium that translates the input signal as current variations to the light intensity variations at the output ports.Thermo-optic effect is adopted in this work for proof of concept.The local refractive index of the multimode waveguide is tuned to manipulate the mode number, profile, and propagation constant, as well as the coupling coefficient with the input/output waveguides.In turn, the light power at the output ports can be indirectly steered. [23,25]Although the structure of the thermally tuned waveguide is extremely simple, the intricate EO response is established through a chain of electrical-to-thermalto-optical signal transition.The end effect is that this waveguide chip offers a dedicated nonlinear network that can be imbedded in an FNN to enable artificial intelligence (AI) applications such as the classification of nonlinear datasets.The processing speed of the proposed EO-MNN system is limited by the thermo-optical modulation speed of the chosen polymer material, which is on the millisecond level. [26]Other index-tuning methods such as ultrafast electro-optic effect [27] will also work on their respective material platforms, so long as the MMI can be effectively altered.
It is worth noting that the input electrical-to-optical signal conversion is a crucial step for almost all the optics-based ANNs, as the information itself, e.g., an image, is from the electronic domain.However, for the ONNs based on the single-mode MZI arrays, the inherent nonlinearity in electrical-to-optical signal conversion is largely overlooked.By applying the DC bias voltage to the MZI, i.e., setting the work point, the MZI operates traditionally in the linear region between the electric input and optical output.The architecture and training approach proposed in this work emphasizes some key benefits on the multimode system.First, the optical computing has undertaken the majority of the neural network functions.Electric computing performs only linear MAC operations and serves as the front-end data conversion interface.The sorting result is readily read out from the output of the multimode optical chip without subsequent processing.Furthermore, the multimode optical chip offers a modularized nonlinear network of its own.The training of the FNN requires only simple and fast adjustment on the weights of the linear FCEL.Last but not least, the MNN requires only one lumped modulation unit where all electrodes are applied and work jointly on one multimode waveguide.For comparison, the number of individual modulators for the single-mode system scales linearly with the port counts, leading to eventually a cumbersome, complex, and high-cost input networks.The proposed electro-optical multimode neural network (EO-MNN) may bring optical computing technology one step closer to practical applications in the field of ONNs.

Experiment System, Data Acquisition, and MNN Training
The optical chip is integrated and tested under the FPWE technology, as shown in Figure 2a.A custom-made microcontroller unit (MCU) is developed, shown in Figure 2b, as parallel current sources to drive the electrodes on the optical chip.The CW laser at 1550 nm is coupled into the chip by a standard single-mode fiber, as shown in Figure 2d,e.The central computer, shown in Figure 2c, provides the interface to update the current settings on the MCU and synchronously records the light power captured by the infrared camera (Bobcat-640, 640 Â 512 pixels, 16-bit resolution).In general, the FPWE technology allows us to control the MMI on the chip via a set of electrodes, monitor the optical output, and form a feedback to the control signals for advanced in situ light-field modulation.More details of the integration technology, the operation principle, and the application examples can be found in our previous studies. [23,25]n essence, the optical output all experimentally without running tedious simulations.A script is written to scan the current in sequence and after running the engine for a few hours, the output data are acquired.This dataset allows us to find the hidden relation between the input and output array using the well-established training techniques from the deep learning technology. [28,29]fter a few trials, a simple network has been found that can well represent the input and output array.The architecture is shown in Figure 3a, in which the three hidden layers consist of 14, 28, and 42 neurons, respectively.The tangent-sigmoid activation function is implemented for the nonlinear activation in each neuron, as shown in Figure 3b.During the training process, the common Levenberg-Marquardt backpropagation algorithm is chosen as the resilient and efficient feedback approach to update the weights. [30]The mean-squared-error (loss) value during training is displayed in Figure 3c.After 715 epochs, the mean squared error is reduced lower than 2.4 Â 10 À4 , proving that the chosen MNN is highly correlated with the experiments.
It is worth noting that the network representing the thermally tuned multimode waveguide may not be unique.In principle, any architecture can be chosen as the MNN candidate so long as the trained network matches the experimental data with minimal loss or error.The complexity of the MNN depends essentially on the input/output array size, but also the physical properties of the multimode waveguide such as the total number of guided modes, the propagation distance, etc. In-depth investigation of the MNN will be carried out as future work.Here, the target is to treat the optical chip as an appropriate modular subnetwork so that the MNN can be readily nested in the FNN to complete the AI function as classifier.

EO-MNN for Classification Tasks
Once the MNN is established, the FCEL is inserted in front of the MNN to construct the FNN, i.e., the EO-MNN.The computational procedure of the EO-MNN is already described in the second section.During the training of the EO-MNN, only the weights (W FCEL ) and bias (b) in the FCEL are adjusted, while the MNN remains as a fixed subnetwork.As the MNN is represented by an equivalent network with known mathematical expressions, the EO-MNN can be trained by the standard backpropagation neural network algorithm. [31]In the forward calculation, the input electric signal is initially reshaped into a vector.The essential target of the FCEL is to convert the vector representation of the input signal to the seven current values, which are then used as the input to the MNN and compute the final optical output.
In our training programs, the one-hot encoding method is used to define the category as a vector with binary values, in which only a single element corresponding to the correct class is 1, while all others are 0.For example, Class 1 corresponds to the vector [1, 0, 0], Class 2 to [0, 1, 0], and Class 3 to [0, 0, 1].A task-specific loss function L is defined to compute the difference between the prediction and the target, as where Y target is the defined target vector, in which the value represents the correct class.Y is the normalized vector of the output optical power containing the three calculated values from the EO-MNN.In the backward propagation, the loss function L is propagated layer by layer from the output to the input.The weight (W FCEL ) and bias (b) in the FCEL are regulated by the error feedback.The continuous modification of weight value is applied to bring the real output of the EO-MNN closer to the expected one.
In the back propagation algorithm, an optimizer based on the adaptive moment estimation method is applied to adjust the weight, as the method is known to reach efficient computing result under simple implementation and low memory demand. [32]he EO-MNN is first tested offline using a nonlinear classification dataset Circles.This type of classification tasks is commonly adopted to verify the ability of the network in solving nonlinear problems. [33]The term "offline" means that at this stage the training and testing take place on the computer only, without actually driving the optical chips.The purpose of the offline method is to complete the FCEL, evaluate the EO-MNN performance, and serve as a comparison basis with the online experiments at a later stage.The dataset comprises two inputs [x 1 , x 2 ] and three classes of data points distributed on three concentric annuluses, as depicted in Figure 4a.The objective of the task is to classify the input data points according to their corresponding annuluses.The task exhibits entanglement and linear inseparability. [34]In total, 1200 datapoints are used for training, while an additional 300 datapoints were reserved for testing.The architecture of the EO-MNN for Circles classification is shown in Figure 4b.The FCEL transforms two inputs into seven current values, which are then utilized as input for the MNN to compute the classification result.Additionally, a threelayer linear neural network (Linear-NN) is trained to compare with the nonlinear EO-MNN.The structure of the Linear-NN is shown in Figure 4c.It is a purely linear network without nonlinear activation functions.The training results of the EO-MNN and Linear-NN are shown in Figure 4d.Employing the same training method, the EO-MNN achieves a test accuracy of 88.3%, significantly surpassing the 51.7% accuracy achieved by the Linear-NN.This suggests that with the addition of the nonlinear MNN, the EO-MNN shows better representation and learning capabilities, leading to enhanced performance for the nonlinear dataset.
The online testing of the EO-MNN is performed on the FPWE system.The weights and biases in the FCEL are taken directly from the offline training.The currents are applied to the respective electrodes on the waveguide chip.The output vector, in terms of optical power received by the camera from the three waveguide outputs, is automatically obtained.Figure 5a illustrates the calculation process and shows an example of the experiment results for the Circles classification result.When different input signals are given as input, the classification result is read out immediately by the waveguide port that gives the brightest light spot.The results are shown in Figure 5b.The online classification accuracy achieved is 88.0%, which is very close to the offline prediction (88.3%).
In online training and offline experiment, the classification accuracy of the Class 2 is relatively low.This can be attributed to the constrained learning capabilities of the EO-MNN caused by the limited number of neurons in the interference layer.To enhance the performance of EO-MNN, an expanded MNN can be developed by increasing the size of the multimode waveguide and the number of electrodes.Detail discussion will be provided in the following section.
To verify the applicability of the EO-MNN, we also trained the EO-MNN for the clothing, handwritten digits and letters as image recognition tasks.The architecture of the EO-MNN for these tasks is the same as the previous task.The input image (28 Â 28) is initially reshaped into a vector (784 Â 1).The FCEL is then added to transform the image vector into the current array (7 Â 1).After that, the current array is fed to the MNN to calculate the classification result.The images of the are from the MNIST-Fashion. [35]We choose the images of T-shirts, trousers, and sandals as the representative recognition targets, while other   another 3000 are used for testing.After epochs, the train accuracy reaches 97.7% and the test accuracy is 97.1%.The images of the handwritten digits are taken from the common dataset MNIST. [36]As shown in Figure 6b, the train accuracy reaches 95.4% and the test accuracy is 95.6% after 519 epochs.The images of capital letters A, B, and C are from the EMNIST dataset. [37]Similar accuracies are also reached after 603 epochs, as demonstrated in Figure 6c.The three datasets are also evaluated via the FPWE system.To evaluate the experiments properly, the complete testing datasets are performed in the experiments for each category, i.e., 3000 images for clothing, 3147 for handwritten digits, and 1200 for handwritten letters.The results are listed in Figure 6d-f.The online classification accuracy achieved is 95.8%, 94.6%, and 94.5%, respectively.The related videos for the classification results are provided in Supporting Information.
Four datasets are used to test the applicability of the proposed EO-MNN system.In the case of the Circles dataset, the nonlinear subnetwork MNN demonstrates a significant improvement in classification accuracy.For all three image classification tasks, the online experimental classification accuracy goes above 94.5%.Further tests show similar accuracies using different clothing types, digits, and letters within their respective datasets.The accuracies achieved in experiments are slightly worse than the offline predictions.The improvement can be done on the MNN side, with a larger collection of data and more accurate construction of the equivalent neural network.On the experimental side, an upgraded FPWE platform with extra control units for thermal, electrical, optical, and mechanical stability will the accuracy in data transmission and acquisition.Nevertheless, this work has proven the functionality, the compactness, and effectiveness of the proposed EO-MNN system and may trigger a variety of interesting applications to be explored.

Discussion and Conclusion
To expand the EO-MNN for solving larger and more advanced problems, a larger MNN can be developed by increasing the size of the multimode waveguide, the number of electrodes, and the number of output waveguides.The same data collection and training process can be applied to establish the MNN.This large MNN can then be imbedded as a modular subnetwork to complete the FNN with proper EO interface.This two-level training method provides some advantages.In terms of the hardware, the optical neural chip remains simple and compact.The nonlinear activation is intrinsically integrated in the electrical-to-optical signal conversion, so that the nonlinear optical materials or the electronic nonlinear computing are not needed.In terms of the training process, the weights are adjusted fast and efficiently in the linear electric layers only.The equivalent, modular MNN approach bypasses the need to pinpoint the complex OE backpropagation process.
One concern regarding the construction of a large MNN is that the required dataset and training effort may increase rapidly with the increasing sizes of the input and output vectors.On the fundamental level, the design rules must be explored further based on the in-depth understanding of the underlying physics and its impact on the neural network structure.For example, the relation between the neural network properties, in terms of the layer and neuron numbers, and the waveguide properties, in terms of the supported mode number, propagation distance, and electrode layout, should be more clearly depicted to enable more general, systematic, and fast implementation of the MNN approach.
In the meantime, one can also adopt the hybrid method, in which the small-to-median-scale multimode waveguide with known equivalent network is adopted as a basic MNN and these MNNs can then be cascaded to construct a larger network.At the device level, the outputs from one multimode waveguide can be connected to the inputs of the multimode waveguides in the subsequent layer.This cascaded MNN architecture using the same thermally tuned multimode waveguide units can save time in data collection and effectively expand its scale.Furthermore, with the increased number of electrodes and output waveguides, we can not only construct the EO-MNN with the electric layer for weight adjustment, but also try to train the MNN for an all-ONN, so as to further reduce the proportion of the electronic calculation.For example, one may include a large network of index-tuning electrodes on the multimode waveguide, within which a subnetwork is used as the input while the rest are adopted as "weights" to adjust and define the multimode waveguide by task-specific algorithms. [38,39]ompared with other single-mode architectures, such as MZI mesh [13] and micro-loop modulator array, [14] the EO-MNN architecture offers distinct advantages.At the input side, the MNN requires only one lumped modulation unit where all electrodes work jointly on the multimode waveguide.In contrast, the number of individual modulators in the single-mode system increases linearly with the port counts, resulting in a more complex and costly input network.The single-mode architectures are limited to linear transformations and the nonlinear activation replies on OE conversion and often extra processing in a computer, whereas the MNN integrates nonlinearity in the electro-optical conversion process through a chain of electro-thermo-optical effects.Furthermore, the design of the multimode optical chip is remarkably simple, consisting of just one multimode waveguide with several access waveguides.The number of control units can also be flexibly adjusted for different tasks (Table 1).
To conclude, we have come up with an optical neural chip based on MMI with a simple and effective OE-MNN architecture.We have demonstrated that such a neural chip can be used for classification tasks and generate output directly in the optical domain.For the design method, an equivalent network MNN is first constructed based on the experimental dataset on the FPWE system.The MNN unveils the hidden relation between the electronic input and the optical output of the thermally tuned multimode waveguide.Though the waveguide structure is extremely simple, it represents a complex neural network with intrinsic nonlinear activation.The FNN is built with a linear electronic layer FCEL as the interface to convert the digital input signals to a set of current values to be applied to the waveguide chip.The FCEL also allows fast and efficient linear weight adjustment in the electronic domain.The EO-MNN is first trained offline by four different datasets.The subsequent online experiments agree well with the offline training, proving the feasibility and generality of the proposed method.
Table 1.Comparison among the different optical neural network architectures.N is the input and output port number.

Architecture
N Â N MZI mesh [13] N Â N micro-ring resonator array [14] N As future work, the fundamental relation between underlying physics of the MMI and the neural network architecture should be explored deeper to lay out more general design rules.The EO-MNN can then be developed systematically for specific problems and the general applicability will be investigated thoroughly.On the practical side, a hybrid approach using cascaded EO-MNNs can be implemented to solve advanced, large-scale problems.The complex nonlinearity in the MNN can also be further leveraged to achieve desired nonlinear functions.Nevertheless, we believe this work may attract more attention to applications of PICs for computational purposes, under the multimode regime, thanks to the simple structure and two-step, modular training method.

Figure 1 .
Figure 1.a) The architecture of electro-optical multimode neural network (EO-MNN).b) Fully connected electronic layer (FCEL) with adjustable weight (W FCEL ) and bias.c) Multimode waveguide chip.The electronic currents applied on the seven microheaters serve as the input array [E 1 , E 2 , E 3, E 4 , E 5 , E 6 , E 7 ], and the optical powers at the output waveguides serve as the output array [O 1 , O 2 , O 3 ].

Figure 2 .
Figure 2. a) Photo of the experimental setup.b) Photo of the microcontroller unit (MCU) current source.c) A computer is connected to control and update the microheaters through the MCU, but also to receive and store the obtained image from the camera.The MCU and the camera are synchronized.d) Photo of the 1550 nm continuous-wave laser.e) Zoomed-in view of the optoelectronic assembly of the chip.

Figure 3 .
Figure 3. a) Architecture of the MNN.b) Summing and activation operation process on each neuron in the MNN.c) The loss variation in the MNN training process.
clothing types work equally well.The training results of the clothing classification are displayed in Figure 6a.In total, 18 000 images of the three clothing types are used for training and

Figure 4 .
Figure 4. a) Dataset Circles visualization with partial datapoints.The dataset has two inputs [x 1 , x 2 ] and three classes of datapoints on three different concentric annuluses with different colors: Class 1 correspond to blue datapoints, Class 2 to red, and Class 3 to yellow.b) The architecture of the EO-MNN for Circles classification.c) The architecture of the linear neural network (Linear-NN) for Circles classification.d) The offline training results showing the numerical accuracy versus epoch number by the EO-MNN training and Linear-NN training.

Figure 5 .
Figure 5. a) The calculation process of the online experiment for the Circles classification.b) The confusion matrixes from the online experimental results for the EO-MNN.

Figure 6 .
Figure 6.The offline training results showing the numerical accuracy and the loss versus epoch number by different datasets: a) Modified National Institute of Standards and Technology (MNIST)-Fashion dataset, b) MNIST dataset, and c) Extended MNIST (EMNIST) dataset.The online experimental results for the optical-electrical (OE)-MNN with measured intensity distributions and the confusion matrixes by the different datasets: d) MNIST-Fashion, e) MNIST, and f ) EMNIST.