Dual Optical Frequency Comb Neuron: Co-Developing Hardware and Algorithm

Previous studies on photonic neural network have demonstrated that algorithm can inspire hardware design. This study seeks to demonstrate that hardware can also inspire algorithm design. To further exploit the advantages of photonic analog computing, the authors develop hardware and algorithm simultaneously for photonic convolutional neural networks. Specifically, this work developed an architecture called dual optical frequency comb neuron (DOFCN) enabled by an integrated microcomb to perform cosinusoidal nonlinear activation and vector convolution without temporal or spatial dispersion and large‐scale modulator arrays. Furthermore, DOFCN‐based composite vector convolutional neural networks (CVCNNs), an optical‐electric hybrid model, are proposed to perform classification and regression tests in signal modulation format identification and optical structure inverse design tasks, respectively. The ablation experiments show that under 4‐bit precision limit, the element‐wise activation CVCNN has 14% higher classification accuracy, 76% lower regression residuals, and 100% higher training efficiency than that of the 32‐bit standard convolutional neural network (CNN). DOFCN exhibits impressive spectral information processing ability to facilitate signal‐processing tasks related to optics and electromagnetics.

into vector-matrix multiplication temporally; [8,9,29] and 3) Fourier transform that converts the original time-domain vector convolution into a space-domain vector product. [37] However, the three methods have two issues. First, the parameters of the convolution operation are hardware-defined, making flexible adjustment challenging. Second, the convolution operation relies on immense modulator arrays or large dispersion, restricting system integration. For nonlinear activation, there are two typical schemes. The first uses nonlinear optical effects, [13,38,39] such as saturable absorption, electromagnetic induction transparency, and narrowband gating characteristics of microrings. The second depends on nonlinear electrical effects, [40] such as nonlinear power amplification. However, most of them increase hardware overhead.
To address these problems, we construct a dual optical frequency comb neuron (DOFCN) that simultaneously realizes vector convolution and cosinusoidal nonlinear activation by integrated microcomb, orthogonal frequency division multiplexing (OFDM), and coherent detection (CD). [41] Additionally, we abstract the DOFCN as a composite vector convolution (CVC) operator and build CVC neural networks (CVCNNs). Subsequently, we select signal modulation format identification (MFI) and optical metamaterial structure inverse design (SID) [42,43] tasks to test the classification and regression capability of CVCNNs, respectively. The ablation experiments reveal that CVCNNs have improved performance than that of the Baseline (32-bit CNNs) reported in refs. [44,45] in both tasks, even under a 4-bit accuracy limit. The novelty of this study can be summarized as follows: 1) We propose and experimentally verify the DOFCN architecture. a) The convolution vector length of DOFCN is software defined. In addition, the agilely adjustable weights and inputs enhance the flexibility of PCNNs compared with preceding studies. b) Compactly and simultaneously perform vector convolution and nonlinear activation without introducing temporal or spatial dispersion and large-scale modulator arrays at the front end. 2) We develop CVCNNs for DOFCN and conduct ablation experiments on MFI (classification) and SID (regression) tasks. a) Classification, regression, and training capabilities of 4-bit CVCNNs are higher than those of 32-bit CNNs of the same size. b) A 14% improvement in classification accuracy, 76% reduction in regression residuals, and 30% improvement in training efficiency for element-wise activation CVCNNs compared to CNNs are observed.

Dual Optical Frequency Comb Neuron
The convolution theorem states that the product of time domain signals equals their spectrum's convolution and vice versa. Consider encoding two vectors A and B into the spectrum of two signals and multiplying them in the time domain to obtain AÃB (Ã for convolution operation) in the frequency domain. Video S1, Supporting Information, shows this process in detail. Accordingly, we designed an architecture called DOFCN ( Figure 1). As shown in ①; in Figure 1a, we use the integrated microcomb to generate the original optical frequency combs (OFCs). The combs in a specific frequency band are then selected, shaped, and amplified by a bandpass filter and EDFA to obtain flat-top OFCs with a starting frequency of 1550.815 nm. Next, we equate the optical combs into two beams, L and S, corresponding to ②;, ③; in Figure 1a. L passes through the electronically controlled optical delay for optical path correction, the amplitude modulator (AM) for single-sideband modulation (can also be replaced with IQ modulator), and EDFA for optical signal compensation. S is similar to L, and a phase modulator (PM) for dynamic phase regulation at the front end of the link stabilizes the relative phases during their coherent synthesis, thus avoiding random jitter. DOFCN uses OFDM (Figure 1b) to encode a composite vector that is mathematically equivalent to a two-dimensional vector. We encode the two dimensions of A and B, into the amplitude and initial phase of the subcarrier to form the OFDM signal. Then, the OFDM signal is modulated to S and L, respectively, by single-sideband amplitude modulation. The modulated spectra are shown in Figure 1a④; and ⑤;. Each comb carries a copy of the OFDM signal while suppressing the optical carriers. The two modulated and amplified optical signals are received by a CD system consisting of a 90°optical hybrid and two balanced photodetectors (BPDs). Finally, the forward and backward convolution results are extracted after data processing. System calibration, control, signal transceiving, and processing are performed by FPGA. A single DOFCN achieves full convolution between two composite vectors; a parallel computing architecture can further extend the computational capability, as shown in Figure 1c. Because the beating process is not direction selective, the forward and backward convolution will be aliased, as shown in ⑥;, ⑦; in Figure 1a; green block represents decreased parts of the spectra; pink patches represent the increased areas in the spectra. Figure S3, Supporting Information, shows the detailed aliasing process. The mathematical derivation can be found in Section S2, Supporting Information. To simplify the discussion, we directly give the conclusion here. Let A and B be the two composite vectors The outputs Y þ and Y À of DOFCN are given as where constant C can be eliminated by power calibration. The series term of Y can be regarded as a generalized convolution, and when φ a ¼ φ b , Y is equivalent to a standard vector convolution (SVC). As its input vector is a composite vector of a (or b) and φ a (or φ b ), we name it the CVC. According to Equation (2), we articulate an equation for the arithmetic power measurement of DOFCN. Let the data baud rate be σ and the system arithmetic power be P. According to Video S1, Supporting Information, the number of multiplication operations involved in AÃB equals the number of elements in an n Â m matrix. In contrast, the number of addition operations equals the number of elements in the matrix minus the number of diagonal elements. In addition, CVC also involves subtraction between phases and cosinusoidal transformations. The number of operations the computer requires to handle the cosine transform varies depending on the method. We consider it similar to the number of operations in one standard convolution operation, as follows Further, we visualize SVC and CVC in Figure 2, where four differences are apparent. 1) Different degrees of freedom of input vectors: SVC has only two degrees of freedom of input AB, whereas CVC has four input degrees of freedom (φ a and φ b in addition to a, b). 2) In-built nonlinear activation: CVC comes with cosinusoidal nonlinear activation, whereas SVC requires additional nonlinear activation layers. 3) Different nonlinear activation types: commonly used nonlinear activation in SVC includes Sigmoid and ReLU, except cos. 4) Different nonlinear activation processes: CVC first activates the results by element and then sums, while SVC sums first and then activates. Therefore, DOFCN applied to neural network models has four potential advantages. 1) More degrees of freedom to realize more complex operations. 2) Hardware is incorporated with nonlinear activation. 3) Nonlinear activation in cosine form is bounded, infinite-order derivable, and can be switched between parity by adjusting the initial phase, providing better learning ability for analytic signals such as radio frequency (RF) and optical fields. 4) Element-wise activation enables nonlinear decoupling between elements, We encode a as the amplitude and φ as the initial phase in pairs onto subcarriers with Δω as the frequency interval to form subsignals. The final output signal is the sum of the sub-signals over the time domain. c) Extended architecture of DOFCN. The blue and red lines represent optical and electrical circuits, respectively. The black dashed box shows the architecture of a single DOFCN, consisting of two amplitude modulators (AMs), a 90°optical hybrid, and two BPDs for CD. DOFCNs use the OFCs as a laser source and for peripheral signal processing.
www.advancedsciencenews.com www.advintellsyst.com which, from the perspective of Fourier analysis, increases the number of basis vectors and improves the model representation capability.

Composite Vector Convolution Neural Networks
To verify the above conjecture, we first abstract DOFCN into a single-layer neural network and implement CVC in five different forms by adjusting the input-output connections (see Figure S4, Supporting Information). Suppose the input vectors are X and W. Then, the five forms are XÃW, cosðXÃWÞ, X cosðXÞÃW, cosðXÞÃW, and ðXÃWÞ cosðXÃWÞ, whose similarities and differences are discussed in Section S3, Supporting Information. Subsequently, we construct CVCNNs by replacing the convolution layers in CNNs with CVC layers. Finally, we validate the classification and regression performance of CVCNNs through two ablation experiments. In Figure 3, we performed an MFI task using the RML2016.10b data set [46] to test the classification performance of CVCNNs. Figure 3a visualizes the entire experiment. The model has four hidden layers. The inputs are IQ signals of 10 different modulation formats in the ½À18, 18 dB signal-to-noise ratio (SNR) range with 2 dB intervals. The outputs are 10-class modulation format labels. Figure 3b explains the network architecture in more detail. The Baseline [44] is constructed by all the modules in the diagram with black boxes on a gray background. CVCNNs are obtained by replacing the gray background modules in the red dashed box with white background modules while retaining the remaining modules. Since the input IQ signals have a length of 128, the corresponding input layer dimension is 2 Â 128. The four hidden layers in the Baseline comprise two convolutional and two dense layers (the alias of the fully connected layer). The first convolutional layer has 256 one-dimensional kernels with a length of 3, while the second layer has 80 two-dimensional kernels with a size of 2 Â 3.
The output layer uses one-hot coding for ten classes of modulation formats and the final output classification results. The nonlinear activation used in the first three layers is ReLU, and the fourth layer is Softmax. For CVCNNs, the first two convolutional layers are replaced by DOFCN layers. The first DOFCN layer has 256 one-dimensional kernels, while the second has 80, with a length of 3. Since the maximum vector length of the prototype is 64 for a single shot, the signal is subsampled before being fed into DOFCN, transforming signal dimensions from 2 Â 128 to 2 Â 64. Signal length constraints could be removed by increasing the DOFCN scale. We also applied CVCNNs to SID to test their regression performance ( Figure 4a). The task aims to infer four parameters (L2, width, space, and period) of the metallic metasurface by the left-and right-circularly polarized (LCP and RCP, respectively) spectra. Please refer ref. [45] for more details. Figure 4b shows the CVCNN architecture. The dimension of the input layer is 2 Â 64. The network contains four hidden layers, with two convolutional and dense layers each. 128 convolutional kernels of length 3 are used in the first convolutional layer and 256 in the second. The network finally outputs four parameter values. Since the task is changed from classification to regression, ReLU is used for nonlinear activations of the Baseline. In contrast, CVCNNs replace the convolutional layers of the Baseline with DOFCN layers, where the first DOFCN layer contains 128 convolutional kernels of length 3, and the second layer includes 256 kernels. Crucially, the CVC operator's effectiveness is closely related to initialization. Suppose the input is X, and the first layer weight is W 0 . We give the following empirical strategy. a) The input X is multiplied by a scaling factor ζ, i.e., the first-level expression is DOFℂℕðζX, W 0 Þ. The scaling factors are given in Table 1. b) DOFCN needs to maintain the same A and B are the input vectors, Z is the shift direction, and σðÞ is the standard nonlinear activation function. The standard vector convolution, shown in the gray box, is as follows: First, the Hadamard product (elementwise product) of A and B is determined. The summed AB is then activated by σðÞ. The CVC process, shown outside the gray box, is as follows: The Hadamard product of AðaÞ and BðbÞ is summed to obtain AðaÞBðbÞ. Simultaneously, the corresponding AðφÞ and BðφÞ are subtracted, and cosðφ a À φ b Þ nonlinear activation is performed by element to obtain cosðφ a À φ b Þ. Finally, AðaÞBðbÞ and cosðφ a À φ b Þ are multiplied by element and then summed.
www.advancedsciencenews.com www.advintellsyst.com distribution between layers and should follow uniform instead of normal initialization; c) The fully connected layer uses normal initialization.

Dual Optical Frequency Comb Neuron
The inputs to CNNs are sparse and longer than that of the convolutional kernels. However, to ensure the robustness of DOFCN, we tested an extreme case where the input lengths of signals S and L are 128. The integrated microcomb spectrum is shown in Figure 5a. Although the full spectrum wavelength range is more than 200 nm and contains numerous combs, we only selected six with the best quality. The wavelengths start at 1550.815 nm, the frequency interval is approximately 100 GHz, and the total power is 16 dBm after shaping and amplification. Physical diagrams of the microcomb are shown in Figure 5b,c. A more detailed description can be found in ref. [47]. The physical diagram of the core link is shown in Figure 5d. The input signal S corresponds to vector A, A ¼ C½1, 1, : : : , 1 128 , where C is the reference level. The frequency range is ½100, 12800 MHz, with a frequency difference of 100 MHz. The input signal L corresponds to vector B, B ¼ C½1, 1, : : : , 1 128 . The frequency range is ½75, 12775, with a frequency difference of 100. Ideally, the forward convolution signal Yþ ¼ C 2 ½128, 127, : : : , 1 128 , whose frequency range is ½25, 12725, with a total of 128 combs arranged in an arithmetic sequence. Similarly, the backward convolution signal YÀ ¼ C 2 ½127, 126, : : : , 1 127 , in the frequency range ½75, 12675 MHz, with 127 combs, is also arranged in an arithmetic sequence. Figure 5e,f shows the acquired spectrum before www.advancedsciencenews.com www.advintellsyst.com and after the system correction, respectively. The dark gray line is the original spectrum. The blue-gray and lime-green lines are the forward and backward convolution spectra corresponding to the original spectrum, respectively. The burgundy and pink lines are the theoretical result of forward and backward convolution, respectively. Owing to link distortion, the cyan-gray region in Figure 5e,f is larger than that of the burgundy region when the dark gray background noise is noticeable. As an analog computing architecture, DOFCN needs specific control and compensation algorithms [6,[48][49][50] to achieve the desired effect. We discuss this in Section S4, Supporting Information.
Comparing Figure 5e,f, the cyan-gray area is significantly reduced, while the spurious dark gray is considerably suppressed after correction. Importantly, the integration time to obtain Figure 5e,f is long. In subsequent experiments, we reduce the input signal length to 64 to balance speed and accuracy. The input signals S and L have a frequency range of ½100, 6400 and ½75, 6375 MHz, respectively, with a frequency difference of 100 MHz. Figure 6 illustrates DOFCN output after spectrum extraction and compensation. Figure 6a,

Composite Vector Convolutional Neural Networks
We compare the simulation and experiment performance of five models in two tasks: MFI and SID. The five models are the Baseline, Baseline without nonlinear activation, cosðXÃWÞ,   Comparing the area and envelope of the cyan-gray and burgundy regions shows that the actual and ideal results deviate significantly before correction. (f ) Spectrum of the calibrated DOFCN output. The legends of (f ) are the same as (e). After calibration, the difference between the actual and ideal results is significantly reduced.
www.advancedsciencenews.com www.advintellsyst.com The experiments are designed with the following objectives: 1) to investigate the effect of accuracy on DOFCN performance by comparing simulations and experimental results. 2) To investigate the effect of nonlinear activation on performance by comparing Baseline with and without nonlinear activation.
3) To investigate the effectiveness of CVCNN by comparing them with Baseline. 4) To investigate the properties of CVCNNs by comparing different CVCNNs. Table 2 summarizes the maximum and mean recognition rates of the five models in the simulation and experiment. Figure 7a shows that the training results of the five models tend to converge as the number of iterations increases, but the convergence rates vary. Figure 7b shows the trend of the recognition rate of each model with different SNR inputs. The dashed and solid lines represent the simulation and experimental results, respectively. The results of CVCNNs applied to modulation format recognition are as follows: a) Impact of accuracy degradation varies for different models. Except for the Baseline without nonlinear activation, the performance of the remaining four models degrades as the input accuracy decreases. cosðXÃWÞ deteriorates most significantly, with the maximum and mean recognition rates decreasing by 15.1% and 8.0%, respectively, whereas the performance of the remaining models deteriorates by 1.5%. The Baseline without nonlinear activation appears anomalous, with performance increasing as accuracy decreases. In addition, the maximum and mean recognition rates improve by 11.7% and 5.2%, respectively. The blue region in Figure 7b presents this phenomenon more intuitively. We speculate that limiting accuracy is similar to pooling, with the effect of feature fusion, and needs further investigation. b) cosðXÞÃW is the optimal model. The highest recognition rate of cosðXÞÃW in the experiment is 85.25%, which is 14% higher than that of the Baseline and has a comparable mean recognition rate of 52.18%. A more detailed analysis is shown in Figure 7b. The burgundy line is the recognition rate of cosðXÞÃW. When the input signal is less than 0 dB, i.e., the noise power is stronger than the signal power, the recognition rate of cosðXÞÃW is lower than that of the Baseline. The negative performance difference corresponds to the green region; in contrast, the positive performance difference corresponds to the scope of the yellow region. The performance inflection point at 0 dB is a common feature of the three forms of CVCNNs, attributed to their noise learning rather than signals. Regarding robustness, the burgundy area represents the gap between the simulation and experiment recognition rate of cosðXÞÃW. The highest recognition rate decreased by 1.4%, whereas the mean decreased by 2.6%. The overall performance degradation is not apparent. Comparing the trend of the burgundy line (cosðXÞÃW) and the green line (Baseline) in Figure 7a, the loss of cosðXÞÃW decreases the fastest, indicating improved training efficiency. At 350 iterations, it is equivalent to the Baseline's loss at 4200 iterations. c. Overall performance of CVCNNs is comparable to that of the Baseline. Except for cosðXÞÃW, the simulation peak and mean accuracy of  X cosðXÞÃW are 10.85% and 4.00% higher than that of the Baseline and 10.77% and 3.52% higher in the experiment, respectively. The loss is slightly lower than that of the Baseline after 4200 iterations, indicating comparable training efficiency. Similarly, the simulation peak and mean accuracy of cosðXÃWÞ are 10.38% and 4.67% higher than that of the Baseline and 15.62% and 20.83% lower in the experiment, respectively, indicating the sensitivity of cosðXÃWÞ to the input signal's accuracy. The error is slightly lower than that of the Baseline after 4200 iterations, indicating comparable training efficiency. Overall, CVCNNs and Baseline have identical performances. The effects of CVCNNs applied to SID are shown in Table 3 and Figure 8. Table 3 records each parameter's residuals and mean residuals for the five models in the simulation and experiment; smaller residuals represent a better regression outcome. Figure 8a shows the training trend. The faster the loss decreases, the more efficient the training. Figure 8b-f show the residuals between the spectra corresponding to a representative set of parameters predicted by each model and the standard spectra. The red and blue lines represent the RCP and LCP standard spectra, respectively. Except for the red and blue stripes, the remaining solid lines represent the experiment values, while the dashed lines represent the simulated values. The experiment and simulation residuals are represented by four color blocks: green, blue, yellow, and red. The larger the area, the larger is the relative error between the experiment and simulation. The best case is when the warm and cool color curves coincide with the red and blue one, respectively. The results of CVCNNs applied to SID are as follows: a. Impact of accuracy degradation varies for different models. Comparing the simulation and experiment results of each model in Table 3, we find that the system accuracy degradation deteriorates the performance of all models to different degrees. The performance degradation of the models in descending order is cosðXÞÃW (0.016), Basline without nonlinear activation(0.081), X cosðXÞÃW (0.082), cosðXÃWÞ (0.283), Baseline (0.294). The color block areas of Figure 8b-f are, in order from smallest to largest: cosðXÞÃW, Basline without nonlinear activation, X cosðXÞÃW, cosðXÃWÞ, Baseline. b. CVCNNs perform better than Baseline; cosðXÞÃW has the best performance. The mean residuals of the three CVCNNs are 0.142, 0.359, and 0.429 in descending order and are higher than the simulation results of the Baseline. From Figure 8b-f, the fit of cosðXÞÃW and X cosðXÞÃW is significantly better than the remaining three, among which cosðX ÞÃW is the best. The performance of cosðXÃWÞ is comparable to that of the Baseline without nonlinear activation, while both are better than that of the Baseline. As shown in Figure 8a, the three CVCNNs almost converge after 1500 iterations, indicating improved training efficiency than the Baseline method that fails to  cosðXÞÃW is the next most efficient, converging after approximately 1250 iterations. Thus, cosðXÞÃW is the optimal model, considering the regression accuracy and training efficiency. www.advancedsciencenews.com www.advintellsyst.com

Summary
Combining the classification and regression tests, we derive the following conclusions. The performances of different CVCNNs vary, but the overall performance is better than that of standard CNNs. Moreover, cosðXÞÃW works best, which is consistent with our prediction. Cosinusoidal nonlinear activation function has infinite vector convolution dimensions and improved analytical ability for continuous spectral signals. Furthermore, the elementwise activation decouples the input vector elements, equivalently increasing the number of bases and analytical power of neural networks, which is confirmed in the performance comparison with cosðXÃWÞ. X cosðXÞÃW performs slightly worse than cosðXÞÃW, but exceeds that of the standard CNNs. cosðXÃWÞ is sensitive to quantization accuracy, and the performance deteriorates significantly in low-precision scenarios but is suitable in high-precision scenarios. However, the current network scale of CVCNNs is limited by hardware conditions and requires a more comprehensive and in-depth study.

Conclusion
This study explores hardware and algorithm co-development to exploit the merits of PCNNs. We first designed a photonic neuron, DOFCN, to flexibly implement composite vector convolution using integrated microcomb, OFDM, and CD. Subsequently, we abstract DOFCNs as CVC operators and accordingly improve CNNs to CVCNNs. In MFI and SID tasks, the accuracy and training efficiency of CVCNNs are comparable to that of standard CNNs. Moreover, cosðXÞÃW-type CVCNNs achieve the best performance in classification and regression tasks. Unfortunately, we have not implemented the array and integration of DOFCN, so the potential computing power of the system has not been well exploited. For this reason, although theoretically DOFCN could achieve fully connected operation, limited by the scale of current system, CVCNNs are still optical-electric hybrid. Yet this work demonstrates that neural network models can inspire photonic hardware development and vice versa. Deeper hardware-software integration may be a viable path for future development in PNNs.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.