Broadband Diffractive Neural Networks Enabling Classification of Visible Wavelengths

Diffractive neural networks (DNNs) are emerging as a new machine learning hardware based on optical diffraction with parallel and high‐throughput information processing. The optical inputs to DNNs are spatially modulated by propagating through passive diffractive layers that work in succession to achieve an inference. Herein, visible wavelength classification using single‐ and two‐layer DNNs fabricated using direct laser writing is demonstrated. The proposed DNN approach accepts the point spread function of two different wavelengths modeled after a microscope objective as the input and modulates the input field toward the target detector for classification. Of the three models trained to classify different wavelength pairs, the highest performance observed is for the classification of 561 and 785 nm, achieving over 90% accuracy. This work demonstrates the potential of all‐optical artificial neural networks for applications requiring visible wavelengths, from visible light beam shaping to spectral analysis and optical imaging.


Introduction
[3][4] Artificial neural networks (ANN), a subset of ML models, excel at recognizing specific patterns and features from overlapping spectral peaks in raw datasets.[7][8] The architecture of multilayer ANNs contributes to the feature extraction from high-dimensional spectral data.However, such complex ANN models demand high-performance hardware to support intensive computation of large and complex input. [9]ith the recent introduction of all-optical diffractive neural networks (DNNs), the implementation of ANNs within the optical domain has redefined optical data processing. [10]DNNs perform computation by manipulating the propagation of light based on diffraction and interference by individual elements ("neurons") on a stack of 3D-printed diffractive layers.Compared to their electronic counterparts, ANNs in the optical domain offer multiple advantages including parallelism, real-time inference at the speed of light, and low power consumption.[13][14][15][16] Neurons are essentially spatially localized phase modulators integrated on a diffractive plate (a neural network layer).When light is incident on a diffractive layer, each neuron generates a secondary wave with a phase and amplitude determined by the product of the incident wave and the trainable complex transmission coefficient at that point.These secondary waves form connections between the neurons of each plate through free-space propagation and interference.An input electric field is propagated through the diffractive layers, and an output electric field is recorded in the form of an intensity image after propagation from the final layer.The propagation of the inputs and outputs is modeled within the simulation to train the neurons.When the training is finalized, the transmission coefficients are physically implemented by converting it to height and fabricating the passive diffractive layers. [10][23] However, shorter wavelengths present a challenge for DNNs in the context of fabricating submicrometer DOI: 10.1002/adpr.202300310Diffractive neural networks (DNNs) are emerging as a new machine learning hardware based on optical diffraction with parallel and high-throughput information processing.The optical inputs to DNNs are spatially modulated by propagating through passive diffractive layers that work in succession to achieve an inference.Herein, visible wavelength classification using single-and two-layer DNNs fabricated using direct laser writing is demonstrated.The proposed DNN approach accepts the point spread function of two different wavelengths modeled after a microscope objective as the input and modulates the input field toward the target detector for classification.Of the three models trained to classify different wavelength pairs, the highest performance observed is for the classification of 561 and 785 nm, achieving over 90% accuracy.This work demonstrates the potential of all-optical artificial neural networks for applications requiring visible wavelengths, from visible light beam shaping to spectral analysis and optical imaging.
feature sizes for the diffractive layers.One of the proposed approaches to address this challenge introduces a formula to achieve a 70°half-cone diffraction angle for each neuron relative to the operating wavelength by tuning the neuron size, number of neurons, and the propagation distance between layers. [24]iven the fabrication and computational constrains of the first two parameters, a significant increase in the spacing between layers was required to achieve the target diffraction angle, therefore, increasing the footprint of DNNs.Miniaturized and compact DNN approaches, such as metasurface-based DNN (MDNN) and 3D nanofabricated DNN, have been demonstrated.MDNN allows multiparametric modulation, encoding polarization, space, and frequency information. [25]However, its complex multistep fabrication, involving electron beam lithography for subwavelength resolution, poses challenges when creating multilayer MDNNs.In contrast, 3D nanofabricated DNN, based on two-photon polymerization, offers mask-free fabrication and height flexibility for compact multilayer DNNs with submicrometer diffractive elements. [19,26]n this work, we present an all-optical wavelength classification approach using perceptron-based ANNs.We used our model experimental setup (Figure 1a) to train our ANN to recognize the point spread function (PSF) of each wavelength modeled after an objective lens and provide inference at the speed of light by mapping the input to a corresponding detector.Each model is trained to classify different wavelength pairs consisting of pair-wise combinations of three wavelengths conventionally used in fluorescence imaging in the visible and NIR: 561, 671, and 785 nm.After the training, each diffractive layer is 3D printed via direct laser writing (DLW).The classification performance of the DNN is analyzed for each wavelength pair based on the energy distribution at the detector.3D-printed DNNs with the ability to classify visible wavelengths at the speed of light could be applied in areas ranging from medical diagnosis to materials science, enabling rapid screening of samples for one or more target analytes based on the presence of optical extinction bands at certain wavelengths.

Results and Discussion
To demonstrate the wavelength classification capability of DNNs, we chose three wavelength pairs: 561-785, 671-785, and 561-671 nm.For each wavelength pair, we modeled the PSF of each individual wavelength and used them as an input training dataset for the model.Accounting for the misalignment of the focal plane during experiments, the models were assessed with validation datasets containing PSF from different offset focal planes with the range of 60 AE 1 μm.The output of the model consists of two detectors defined as two areas on the charged-coupled device (CCD) camera.D 1 refers to the detector on the top left, corresponding to the shorter wavelength in a wavelength pair (λ 1 ), and D 2 refers to the detector on the bottom right representing the target detector for the longer wavelength (λ 2 ) as shown in Figure 1b.Two models were trained separately for each wavelength pair to compare the performance between single-layer and two-layer DNNs.Similarly, we trained single-and two-layer models for simultaneous three-wavelength (561-671-785 nm) classification, including an additional region D 3 in the detector.All models were trained with similar parameters.Once the training was complete, the trained phase delay obtained for each layer was converted into a height map and 3D printed using a custombuilt setup.More details on the training parameters, generation of the height maps, and fabrication setup can be found in the Experimental Section.The classification performance of the 3D-printed DNNs was analyzed based on the maximum energy at the detector.

DNN Characterization
We fabricated our DNNs using two-photon polymerization through DLW using a custom system (described in Experimental Section) that enables 3D printing at submicrometer scales.Circular substrates were fabricated as supporting layers for each diffractive layer 66 μm Â 66 μm in size.A complete two-layer DNN is shown as an example in Figure 2a, with the top diffractive layer shown in Figure 2b.Neurons fabricated with this method had an average diameter of 532 AE 25 nm, within error of their 550 nm nominal diameter, as measured from highmagnification scanning electron microscope (SEM) images (Figure 2c).This uncertainty in neuron size (4.7%) could be detrimental to DNN performance, as neuron diameter is critical for visible wavelength applications. [27]This value arises from the fabrication tolerance, although it may be overestimated due to limitations in the image analysis.Diffractive plates were designed according to the phase plates obtained from the simulation as described in Experimental Section.Neuron height is directly related to the phase delay, as shown in Figure 2d, which corresponds to the phase plate used to design the diffractive plate shown in Figure 2b.SEM images and phase plate for a singlelayer DNN and additional SEM images of the diffractive plates can be found in the Supporting Information (Figure S1 and S2).

Numerical Simulations
We have trained two models for all wavelength pairs: single-layer and two-layer DNNs.Similar to conventional neural networks, the performance of DNNs can be improved with multiple diffractive layers.While there is no nonlinear nature inherent to ANNs, DNNs display a "depth" advantage when the number of layers is increased. [28]The intensity loss of the model reduces when the number of layers increases, as demonstrated in Figure S3 (Supporting Information).Additional diffractive layers enable further phase modulation of the optical data, improving accuracy and reducing background noise.
Here, we compared the performance of single-layer and twolayer DNNs for each model in terms of classification accuracy and energy distribution between detectors.The simulated output of the network for the 561-785 nm pair is depicted in Figure 3.The top rows in Figure 3a,c show that the output of the network correctly maps the input wavelength to the target detector for both single-layer and two-layer DNNs.The model outputs for wavelength pairs 561-671 and 671-785 nm are similarly capable of mapping the input PSF to the target detector in the simulation (see Figure S4 and S5 in the Supporting Information).The classification accuracy was 100% for all wavelength pairs validated with 100 test datasets containing PSF from different axial planes.Both single-and two-layer DNN demonstrated an average energy distribution at the target detection region of 98.5% and 98.8%, respectively.The results of the simultaneous classification of three wavelengths showed 100% classification accuracy for both single-layer and two-layer DNNs with an average energy distribution of 83.7% and 96.3%.The energy distribution across all eight models demonstrates the capability of the network to learn the complex electric field input and map each input wavelength to the allocated detector.

Wavelength Classification with Single-and Two-Layer DNNs
Comparing the experimental results obtained for example single-layer and two-layer models (bottom rows in Figure 3a,c) with the simulation output shows that the experiment is in good agreement with the simulation.Figure 3b,d shows the intensity distribution across the detectors, indicating a higher experimental crosstalk for both single-layer and two-layer DNNs than the values obtained in the simulations.Here, we have considered the total intensity as the sum of the intensity measured at D 1 and D 2 , disregarding the intensity collected at any other areas of the camera.Similarly, the images in Figure 3a,c have been masked to exclusively show the intensity at D 1 and D 2 .Only a small fraction of the total input energy, ranging between 0.22% and 1.33%, is directed to the target detector regions, as calculated from unmasked images (see Figure S6, Supporting Information).These values correspond to the diffraction efficiency of the fabricated DNNs (listed in Table S1, Supporting Information) and are comparable to previously reported values. [29,30]his increased crosstalk between target detectors likely originates from experimental conditions that were not included in simulations such as fabrication errors, misalignment of the input from the center of the diffractive plate, and the angle of the incoming beam.The average at the target detectors was similar for both single-layer (Figure 4a) and two-layer DNNs (Figure 4b) as measured with five different samples of each model.Two-layer DNNs showed a slightly worse performance, directing 74.8% of the intensity to the target detector compared to 76.9% for the single-layer DNNs.This small difference can be attributed to the variations encountered during fabrication, which may have a greater impact on two-layer DNNs due to extended fabrication times.Although all DNNs were characterized using SEM after wavelength classification experiments and found to conform to the design, any variations in the bottom layer would not have been detected.However, this had no impact on the classification accuracy, which was found to be 100% for both single-layer and two-layer models, as shown in Figure 4c,d.
As summarized in Table 1, there was a much greater difference between the single-and two-layer DNN performance in terms of the average intensity at the target detectors for the other wavelength pairs considered in this work.Figure S6 and S7 in the Supporting Information show the comparison between the simulated and the experimental output for the models trained to classify the 561-671 and 671-785 nm wavelength pairs.In the model classified between 671 and 785 nm, significant crosstalk between detectors in single-layer DNNs led to a 50% classification accuracy, which increased to 100% accuracy for the two-layer DNNs.This resulted from a much higher fraction of the input intensity being directed to the target detector, especially for the 785 nm input wavelength, as detailed in Figure S7 (Supporting Information).The improvement in energy distribution with the two-layer DNN and the resulting improved classification accuracy shows that smaller differences between the input wavelengths require a greater number of diffractive layers to learn and distinguish between them.Furthermore, the deviation between the simulation and experimental results signifies the importance of the fabrication quality of individual neurons on the diffractive layers where larger neurons would reduce the diffraction efficiency.
The variability within the fabrication becomes especially critical to DNN performance at shorter wavelengths, for which the larger neuron diameter reduces the diffraction efficiency (see Table S1, Supporting Information).This is observed in the model trained to classify 561 and 671 nm input PSFs, as shown in Figure S8 (Supporting Information).Single-layer DNNs consistently misclassified both input wavelengths due to significant crosstalk between detectors for both 561 and 671 nm.Although two-layer DNNs showed a much better performance for the longer input wavelength, this was not the case for 561 nm, which reduced the overall classification accuracies to 50%.At shorter wavelengths and smaller differences in input wavelengths, fabrication variability may play a key role in the experimental model performance.Here, the relatively large uncertainty in the size of the diffractive elements, as discussed earlier, may be the reason for the low performance observed for the 561-671 nm wavelength pair.Besides reducing fabrication variability, increasing the number of diffractive layers may improve the inference performance for this model, as observed with the model trained to classify 671 and 785 nm inputs.
Similarly, a reduced fabrication variability and potentially, an increased number of diffractive layers, would enable classifying multiple wavelengths in parallel.Here, single-and two-layer DNNs designed to classify all three wavelengths (561, 671, and 785 nm) in parallel performed poorly, with classification accuracies of 53% and 60% respectively (Figure S9 and S10, Supporting Information).The poor inference of the two-layer model resulted from the misclassification of both 561 and 785 as 671 nm.This indicates that the measured low-classification accuracies stem from the addition of a third-wavelength class, in addition to the challenges observed for small input wavelength differences, especially at shorter wavelengths.Classification of multiple wavelengths in parallel is likely to require DNNs with a higher number of layers.
However, the characteristics of the polymer used to fabricate DNNs imposed constraints on the number of layers that can be fabricated.The sol-gel polymer has a solid outer shell after baking and requires an oil immersion configuration to fabricate structures at the required spatial resolution, which limits the capacity for multilayer fabrication.With the DLW experimental configuration we used in this work, DNNs with more than two layers could not be fabricated.Adding an additional third layer would increase the height of the DNN by 45 μm following the propagation distance used in the simulation.The total height of three DNN layers would be 135 μm, exceeding the total working distance of the high numerical aperature (NA) microscope objective within the system.Whilst larger structures could still be realized, the resolution becomes insufficient to fabricate  neurons with dimensions of 550 nm (Figure S11, Supporting Information).This could potentially be addressed using a different polymer that enables a dip-in configuration during fabrication.

Conclusion
We have presented an all-optical wavelength classification approach through perceptron-based ANNs to enable fast and passive classification using one or two layers of DNNs.We have shown that DNNs with submicrometer features fabricated by DLW can distinguish the unique input PSF for different wavelengths and map the input to target detector.However, the results deviate from those predicted by the numerical simulations for higher-energy visible wavelength pairs.This can be attributed to the fabrication limitations affecting the size of the individual neurons and subsequently the diffraction efficiency.In future works, increasing the number of diffractive layers to enable further modulation for shorter wavelengths could potentially improve classification accuracy and enable more wavelengths to be classified using a single model.[33] The period between metaatoms reduces the proximity effect from two-photon polymerization, resulting in an improved resolution that would be critical for applications involving shorter visible wavelengths.The robustness of the model could further be improved by accounting for the misalignment of the input PSF from the center of the diffractive layer and the angle of the incoming beam.These additions can play a critical role in realizing complex spectral analysis for future applications.

Experimental Section
Point Spread Function and Optical Propagation Simulation: Our wavelength classification system was modeled with three components.At the input of the network was the simulated PSF of a 0.13 NA objective lens.The PSFs of each wavelength (561, 671, and 785 nm) were generated following the fast Fourier Transform of the vectorial Debye theory. [26,34,35]he complex electric field along the x-polarized plane wave illumination across the pupil plane was represented by Eðx, y, zÞ ¼ FFT expðÀikzn cos θÞ Propagation of the incident field through the diffractive layers was modeled after the Rayleigh-Sommerfeld diffraction equation.

wðx, y, zÞ
where and λ is the illuminating wavelength.Each diffractive layer, l at position z l , comprised individual pixels with trainable transmission coefficients that modulated the phase of the optical field.The propagation of the modulated field to the next layer could be written as t l ¼ expðjϕðx, y, z l ÞÞ Machine Learning Training: Each layer in the model contained 120 Â 120 pixels with a lateral dimension of 66 Â 66 μm.The circular pixels with a diameter of 0.55 μm were implemented in the simulation due to the minimum pixel size that could be fabricated with the custom DLW system.The propagation distance between layers was set to 45 μm to ensure adequate communication between neurons whilst maintaining feasibility for multi-layer fabrication.
The training and validation dataset comprised of PSF for each wavelength and labels corresponding to the allocated detector.The PSFs were generated with a small axial offset (60 μm) from the network to illuminate more neurons of the first layer.Each wavelength was assigned to a detector with a size of 6.6 Â 6.6 μm.
The training of the diffractive network was implemented through TensorFlow (v2.4.1),Python version (v.3.8.10), and a graphical processing unit (NVIDIA Quadro K2200).The classification model was trained using a batch size of 1 for 100 epochs with a training dataset split to 4000 training and 100 validation PSF of two wavelengths.During training, the error between the target output and predicted output was calculated using the mean squared error function.This error is backpropagated using the stochastic gradient descent algorithm, Adam, [36] with a learning rate of 0.02.The loss for all models plateaus after 100 epochs suggesting that there is no further benefit of training the model for more epochs (see Figure S12 in the Supporting Information).
The performance of the model was assessed based on the energy distribution matrix and the confusion matrix.For every predicted output, the energy distribution between the detectors was calculated by multiplying two masks, each corresponding to the detector assigned to a wavelength.This distribution is translated to classification by identifying the detector with the maximum energy.
Nanofabrication: The transmission coefficients of the diffractive layers are converted into height following: where λ c is the average wavelength of the two input wavelengths, Δn is the refractive index difference between the polymer and air, and ϕ is the phase value of each pixel.Phase wrapping was added in this calculation to increase the stability of diffractive elements.The fabrication of the supporting structure and individual diffractive elements was implemented with DLW based on two-photon polymerization.Specifically, a hybrid zirconium organic-inorganic photoresist was drop cast onto a coverslip with a thickness of 0.17 mm and then baked for 1 h at 60 °C.After exposure, the samples were immersed in a 50:50 1-propanol:2-propanol solution for a day and then blow dried with a nitrogen gun.The parameters used to fabricate both components can be found in Table S2 (Supporting Information).
A diagram of the custom fabrication setup can be found in the Supporting Information (Figure S13).The system consisted of a 1070 nm femtosecond fiber-coupled laser (Coherent Fidelity II, 55 fs pulse width, 70 MHz repetition rate) coupled to a frequency doubler (APE HarmoniXX).A 100Â 1.4 NA oil immersion objective was used to form a tight voxel within the photoresist.The microstructures were traced with a combination of a piezoelectric nanostage (P-545.xR8SPInano XY(Z) Piezo System Physik Instrumente) and a pair of galvomirrors.
Characterization: The printed DNNs were characterized with a reflectance optical microscope (Nikon LV150N, 20Â 0.45NA) to observe the quality of the fabrication prior to the experiment.After the wavelength classification experiments, the DNNs were sputter coated with 10 nm-thick layer of iridium and examined on an SEM (FEI Verios 460 L XHR-SEM).To visualize the full DNN structure, including the supporting pillars of multilayer structures, low-magnification (1200Â) SEM images were obtained at a stage tilt of 35°.High-magnification (25000Â) images at no stage tilt were collected for neuron size analysis.This was performed using ImageJ to measure the distance between the centers of adjacent neurons in high-magnification (25000Â) SEM images.The average neuron diameter was measured over 145 neurons from five different DNNs.
Wavelength Classification Experiment: Three continuous wave lasers were used to illuminate the DNN, Ventus 671, Coherent OBIS 561 and 785.Two long-pass dichroic mirrors were inserted into the path of the 785 nm laser to align all three laser beams into a single optical path.An objective lens (Olympus UPLANFL N, 4Â 0.13 NA) mounted on a translation stage was used to implement the axial offset from the DNN.The sample was mounted on a piezoelectric nanostage (P-545.xR8SPInano XY(Z) Piezo System Physik Instrumente) for fine alignment in the x,y,z axis.The image at the output was captured by a 100Â oil immersion objective lens (Olympus UplanSApo, 1.4 NA), a tube lens, and a CCD camera (Basler ace acA2040-90uc).
For each classification model, the output from five different samples was collected under the same exposure and gain conditions.With the microscope configuration detailed earlier, the raw output was captured by the CCD camera with a sensor area of 2040 Â 2046 pixels and a pixel size of 5.5 Â 5.5 μm.During digital postprocessing, the raw images were multiplied by a mask with 1496 Â 1496 pixels corresponding to the target detector, resized relative to the magnification of the imaging system.

Figure 1 .
Figure 1.Experimental setup schematic for wavelength classification using DNN.a) A continuous-wave laser (561, 671 or 785 nm) is focused on to the diffractive layers using a 4Â microscope objective.A 100Â objective is used to collect the light at the output of the DNN and focused onto a CCD camera.b) Conceptual diagram of an input beam propagating through diffractive layers with different phase modulations toward the target detector at the output.

Figure 2 .
Figure 2. Fabricated two-layer DNNs for wavelength classification.a) SEM of a two-layer DNN fabricated through two-photon polymerization via DLW.The top diffractive layer is shown in (b) comprising individual neurons depicted in (c).d) Numerical simulation of the phase delay corresponding to the diffractive layer shown in (b).Scale bars are 25 μm in (a) and (b), and 1.5 μm in (c).

Figure 3 .
Figure 3.Comparison between simulated and experimental output patterns from the model trained to classify 561 and 785 nm.Output intensity at the target detector is compared between single-layer and two-layer DNN for input PSFs collected at a 60 μm axial offset from the focal spot for a) 561 nm and c) 785 nm.b,d) The energy distribution percentages (b) and (d) at each detector region were calculated by multiplying with the target detector mask.

Figure 4 .
Figure 4. Performance of single-and double-layer DNNs.Intensity distribution at the detector for a) single-and b) double-layer DNNs for the classification of 561 nm and 785 nm.Five DNNs of each type were used to generate these data.c,d) The confusion matrix for the single-and two-layer DNNs extracted from the intensity distributions.

Table 1 .
Experimental performance comparison for different wavelength pairs.Percentage of energy distributions at target detectors are calculated over the energy at both detectors averaged over five samples.