Multispectral Quantitative Phase Imaging Using a Diffractive Optical Network

As a label‐free imaging technique, quantitative phase imaging (QPI) provides optical path length information of transparent specimens for various applications in biology, materials science, and engineering. Multispectral QPI measures quantitative phase information across multiple spectral bands, permitting the examination of wavelength‐specific phase and dispersion characteristics of samples. Herein, the design of a diffractive processor is presented that can all‐optically perform multispectral quantitative phase imaging of transparent phase‐only objects within a snapshot. The design utilizes spatially engineered diffractive layers, optimized through deep learning, to encode the phase profile of the input object at a predetermined set of wavelengths into spatial intensity variations at the output plane, allowing multispectral QPI using a monochrome focal plane array. Through numerical simulations, diffractive multispectral processors are demonstrated to simultaneously perform quantitative phase imaging at 9 and 16 target spectral bands in the visible spectrum. The generalization of these diffractive processor designs is validated through numerical tests on unseen objects, including thin Pap smear images. Due to its all‐optical processing capability using passive dielectric diffractive materials, this diffractive multispectral QPI processor offers a compact and power‐efficient solution for high‐throughput quantitative phase microscopy and spectroscopy.

A typical QPI device is composed of two primary components: (1) an imaging front-end responsible for conducting optical interferometry to convert the desired phase information into intensity signals that can be captured using a digital image sensor, and (2) a digital processing back-end tasked to perform the essential image processing and reconstruction of quantitative phase images based on these signals.Recently, propelled by advancements in deep learning, the versatility and performance of digital backends used in QPI devices have been extensively enhanced.For instance, by leveraging the massively parallel computing power of graphics processing units (GPUs), the image reconstruction speed and throughput of QPI devices have seen substantial improvements [30][31][32] .Moreover, deep learning algorithms can be utilized to interpret QPI images to accomplish advanced tasks such as the classification and detection of specific target objects [33][34][35] , while also assisting in solving inverse problems in QPI, which include phase retrieval [31,32,[36][37][38][39][40][41] , aberration correction [42,43] , depth-of-field extension [44,45] and image modality transformations [18,46] .By and large, QPI devices that are commonly employed utilize a monochromatic source, thereby providing phase information corresponding to a single wavelength.Building upon this, multispectral QPI systems [47][48][49][50][51] have also been developed to simultaneously acquire a sample's phase information across multiple spectral bands.These advancements facilitated the detection of samples' dispersion signatures and their spatial distribution, finding use in diverse areas such as quantifying hemoglobin concentration in erythrocytes [47,51] and mapping the spatial refractive index distribution of the retina [52] .However, these existing multispectral QPI techniques require bandpass filters (e.g., color filter wheels) or a tunable laser source to obtain the desired QPI signals at individual wavelengths.This can introduce considerable cost and complexity to the QPI system hardware, enlarge its footprint, and potentially result in chromatic aberrations in the imaging system if left uncorrected.Moreover, the digital processing of the resulting multispectral holographic images further adds to the computational load of the digital back-end.
In this work, we introduce a multispectral QPI system designed based on a broadband diffractive optical neural network, which can simultaneously obtain the quantitative phase profiles of dispersive phase objects across a wide range of spectral bands within a single snapshot.This diffractive multispectral QPI processor comprises a series of spatially-engineered dielectric diffractive layers, collectively forming an optical neural network [53][54][55][56][57][58][59][60][61][62][63][64][65] , as illustrated in Figure 1.After its optimization through deep learning, this optical network transforms the multispectral phase information of input objects into a distinct intensity distribution at the output field-of-view (FOV) that spatially encodes the object phase information corresponding to each target spectral band separately.This intensity distribution can be measured using a monochrome image sensor array, and, following a standard demosaicing and normalization process, results in quantitative phase images of the input object at the target spectral bands of interest.Our diffractive QPI processor design has a very compact footprint that axially spans only ~100  m , where  m represents the average wavelength of the target spectral band.It incorporates passive optical elements, and can instantaneously perform end-to-end multispectral QPI directly on an input object or scene, offering a powerful alternative to traditional digital QPI frameworks that depend on specialized hardware for the optical relay, spectral filtering and digital phase recovery steps.
To numerically demonstrate the feasibility of our concept, we trained two diffractive designs that can perform multispectral QPI at 9 and 16 unique spectral bands, evenly distributed within the visible spectrum from 450 to 700 nm.Both of these designs utilize 10 successive diffractive layers that are laterally engineered at a feature size of ~225 nm, forming compact diffractive systems axially spanning ~64.8 μm.The performance of these diffractive multispectral QPI designs was tested based on numerical simulations, demonstrating average peak signal-to-noise ratio (PSNR) values of >17.0 dB and >16.6 dB and structural similarity index measure (SSIM) values of >0.77 and >0.72 for the 9-and 16-channel designs, respectively.Our cross-talk analyses also indicate that, for a given group of sensor pixels designated to a target spectral band, the optical power received from the target spectral band was, on average, more than 7 higher than the mean power received from the other wavelengths (representing cross-talk), showcasing the success of our spectral filtering performance.We also quantified the spatial resolution of our QPI designs to reveal that our 9-and 16-channel diffractive designs could resolve phase features with a linewidth of ≥5.4 μm and ≥7.2 μm, respectively.The external generalization performance of our diffractive multispectral QPI designs was further validated through blind tests using images of Pap smear samples, showcasing its versatility to accommodate a wide range of distinct spatial features never seen in the training of the diffractive networks.
Our diffractive multispectral QPI design is also scalable according to operating wavelengths.While we employed the visible spectrum for the numerical testing of our diffractive multispectral QPI designs, they can be adapted for operation at other wavelengths by simply scaling their dimensions proportional to the new spectral band of operation; this feature offers particular value for different parts of the electromagnetic spectrum where high-density, large-area spectral filter arrays are not widely available or are cost-prohibitive.Although we employed the visible spectrum for the numerical testing of our diffractive multispectral QPI designs, their intrinsic scalability also allows for adaptation to the other parts of the electromagnetic spectrum, especially those where high-density and large-area spectral filter arrays are not widely available or are cost-prohibitive.Moreover, since the diffractive layers in our designs use isotropic dielectric materials, their quantitative phase imaging operation is insensitive to the input polarization state of the illumination light.By amalgamating these unique advantages, our diffractive multispectral QPI framework offers transformative opportunities for designing quantitative phase microscopy and spectroscopy systems.These diffractive systems can be integrated with monochrome optoelectronic sensor arrays or focal plane arrays at different parts of the spectrum, resulting in high-throughput on-chip imaging and sensing devices that can find various applications in e.g., biomedical imaging, material science, and environmental monitoring.

RESULTS
Figure 1 illustrates a schematic of our diffractive multispectral QPI framework.As shown in Figure 1a, a dispersive transparent sample is illuminated by broadband spatially-coherent light at the input plane of the system.This broadband illumination can be viewed as a set of plane waves at distinct wavelengths { 1 ,  2 , … ,    }, organized in order from the longest to the shortest wavelength.Situated after the input plane, the broadband diffractive network comprises multiple modulation layers made of dielectric materials, where each diffractive layer is coded with the same number of spatially engineered diffractive features that have a lateral size of ~  /2 and a trainable/learnable thickness, providing a phase modulation range covering 0-2π for all the illumination wavelengths.These diffractive layers are connected to each other and the input/output planes through light diffraction in free space (air).
As outlined in Figure 1b, the optical fields {  } ( ∈ {1,2, … ,   }) immediately following a phase-only transmissive object exhibit unique phase profiles {  } at each transmitted wavelength   , and can be represented as {  =    } (  ∈ ℝ   () ×  () ).These complex fields at different wavelengths are simultaneously processed by the diffractive deep neural network (D 2 NN) to produce output fields {  }, i.e.,   = D 2 NN{   }.The intensity profiles of these outputs are subsequently sampled by a monochrome image sensor that integrates all the wavelengths, resulting in an output optical intensity measurement  that can be expressed as: Without loss of generality, we assume   =1 for all the wavelengths of interest.Considering the fact that the output optical intensity  measured by the image sensor is generally coupled to the illumination power and the output diffraction efficiency, the values of  cannot be immediately used to represent the quantitative phase values of the input object.To provide QPI performance that is independent of such power-related output fluctuations, we employed a simple normalization scheme, where we spatially divided the output measurements () into an output signal region  and a reference signal region ℛ to obtain the final quantitative phase measurements by calculating the relative ratio of the signals acquired within  and ℛ.As illustrated in Figure S1, Supporting Information, ℛ is defined as a one-pixel wide frame located at the edges of , and can be further subdivided into   sub-regions, each denoted as ℛ  (  ∈ {1,2, … ,   }).Each ℛ  creates a wavelength-specific reference signal   , i.e.,   = 1  (ℛ  ) ∑ (, ) (,)∈ℛ  (2).
where  (ℛ  ) denotes the total number of image sensor pixels located within ℛ  .
The obtained pixel intensities within , i.e.,   , are spatially divided into   () ×   () blocks, where each repeating block contains √  × √  pixels individually assigned to the   target spectral bands according to the pixel-wavelength mapping relationship illustrated in Figure 1b.Accordingly, a spatial demosaicing operation [  ,   ] is used to extract these pixel intensities in an interleaved manner, resulting in   quantitative phase images, each corresponding to a unique spectral band   ,  ∈ {1,2, … ,   }; see the Methods section for details.This demosaicing operation is similar to extracting the color image components from a raw image obtained with a Bayer filter-based image sensor.Finally, after being normalized using the reference signals   , the quantitative phase image   of the diffractive processor can be obtained: .
If our diffractive multispectral QPI network training is successful, we will have: Stated differently, the role of our diffractive multispectral QPI processor is to route the phase information of an input object at   unique wavelengths into a periodically repeating virtual array of √  × √  pixels, each covering one wavelength of interest.This multispectral QPI processor forms an all-optical transformer that simultaneously performs (1) spectral-to-spatial and (2) phaseto-intensity transformations, serving the functions of both multispectral filtering and quantitative phase retrieval.More details about this processing pipeline and related mathematical expressions are provided in Figure 1b and the Methods section.
For the training of our diffractive multispectral QPI processor, we prepared a training image dataset composed of 110,000 images containing 55,000 handwritten images and 55,000 customcreated grating/fringe-like patterns.These structural patterns, including but not limited to gratings, patches, and circles, were created in an earlier work of our group [66] , designed to augment the training data.During the training process, these images are randomly selected and encoded into the phase channels   of the multispectral input fields using a dynamic range of [0,  tr ].
Considering the predominantly binary nature of the training images, we set the phase contrast parameter ( tr ) of each input training image to be randomly sampled within 0.1 to 1, i.e., [0.1,1],thus ensuring wide coverage of different phase contrast values from  tr,min = 0.1 to  tr,max = 1.
Error-backpropagation and stochastic gradient descent were used to optimize the thickness profiles of the diffractive layers, with the objective of minimizing a custom loss function ℒ defined based on the mean-squared error (MSE) between the diffractive output quantitative phase images and their ground truth across all the wavelength channels, i.e., ℒ = .Here   represents the normalized optical intensity distribution at a target spectral band   within the output FOV of the diffractive imager, and   (GT) denotes the ground truth counterpart of   .
Further details regarding the training can be found in the Method section.
As a proof-of-concept, we numerically demonstrate diffractive multispectral QPI processor designs that operate in the visible spectral range, i.e.,    = 450 nm and  1 = 700 nm.We trained two multispectral QPI designs, which mainly differ in their number of target spectral bands   and input spatial resolution (  () ×   (𝛹) ).The first design features   = 9 with   () ×   () = 28 × 28, while the second design has   = 16 with   () ×   () = 12 × 12 (see the Methods for details).Both of these diffractive designs are composed of 10 diffractive layers, where each diffractive layer has 1,000 × 1,000 and 784 × 784 diffractive features for the designs with   = 9 and 16, respectively.The entire diffractive volume spans an axial length of ~112.7m = 64.8μm and a lateral size of 225 μm (176.4 μm) for the diffractive design with   = 9 ( 16), forming a compact system that can be monolithically integrated with a CMOS image sensor.Without loss of generality, at the output plane of these diffractive designs, a monochrome image sensor with a pixel size of 1.8 μm × 1.8 μm (~3.13 m × 3.13 m ) is assumed to be used.A unit magnification is assumed between the object/input plane and the monochrome output/sensor plane, resulting in the same size of the output signal region  as the input FOV.
After the training process is complete, the resulting diffractive layer thickness profiles of the two designs are presented in Figure 2a and Figure S2a, Supporting Information.To assess the multispectral QPI performance of these diffractive designs, we used a test set of 5000 phase-only objects, wherein each object is created by randomly selecting images from the MNIST test set and encoding them into the multispectral object phase channels using a dynamic range of [0,  test ].
Two commonly used image quality metrics, SSIM and PSNR, were used to compare the resulting diffractive QPI measurements () and their ground truth counterparts () for all the target spectral bands.We summarized the resulting QPI performance under  test = 1 for the two diffractive processor designs with   = 9 and 16 in Figure 2b and Figure S2b, Supporting Information, respectively.The SSIM and PSNR values for the   = 9 design were calculated as 0.770 ± 0.015 and 17.04 ± 0.33 dB, respectively, while for the second design with   = 16 these same metrics were found as 0.726 ± 0.031 and 16.67 ± 0.43 dB, respectively.
We also visualized examples of these blind testing results in the bottom two rows of Figure 3a and Figure S3a, Supporting Information for the two diffractive processor designs with   = 9 and 16, respectively.The diffractive quantitative phase images clearly exhibit high structural fidelity matching their ground truths, further confirming the success of our designs.Additionally, we also present an image matrix that visualizes all the contributions from each channel of the input phase profile   under illumination   to each of the output QPI measurement channels   .These individual contributions are denoted as  , , and can be expressed using the following form: (5), which denotes the intensity images measured by the array of monochrome sensor pixels corresponding to the channel   using illumination at   .Due to the spectral integration process occurring at the monochrome image sensor, as indicated in Equation 3, all the  , values in the same column will add up to form   .Our results shown in Figure 3a and Figure S3a, Supporting Information demonstrate that the input phase profiles of each spectral band are successfully converted into the correct intensity distributions within their corresponding output QPI images (i.e., the diagonal entries with  = ); there are also some weak cross-talk terms corresponding to the other spectral bands (i.e., the off-diagonal entries with  ≠ ).
To better quantify the impact of this spectral cross-talk on our multispectral QPI results, we calculated the mean optical power of  , using the diffractive design with   = 9 over the entire testing set, and summarized the results into the confusion matrices reported in Figure 3b and c.A similar cross-talk analysis for the other diffractive QPI design with   = 16 was also reported by the confusion matrices presented in Figure S3b and c, Supporting Information.In all of these analyses and the corresponding confusion matrices, each row represents a distinct illumination wavelength (  ), while the columns correspond to individual sensor pixel groups (  ), each designated to a particular spectral band   .In the confusion matrices presented in Figure 3b and Figure S3b, Supporting Information, the optical power values, presented as percentages, are normalized using the total power of a specific illumination wavelength ( * ) received by the entire sensor signal region (), resulting in a summation of 100% in each row.Based on these analyses, we found that, for the two diffractive designs with   = 9 and 16, when a given illumination   is used, the power received by its corresponding sensor pixel group designated to this wavelength (corresponding to the diagonal entries  , | = ) was, on average, (7.38 ± 1.24) and (9.95 ± 1.93)fold greater, respectively, than the mean power received by the other sensor pixel groups (corresponding to spectral power leakage,  , | ≠ ).Therefore, the amount of energy directed to incorrect sensor pixel groups is minimal (as desired) compared to that directed to the intended group of pixels.Similarly, the confusion matrices in Figure 3c and Figure S3c, Supporting Information present another view on the magnitudes of these cross-talk components.Here, the normalization is performed using the total power received by a particular sensor pixel group ( * ), leading to a column-wise summation of 100%.From this perspective, given a group of sensor pixels   , the power received at its designated target spectral band (corresponding to  , | = ) is found to be on average (7.32 ± 0.19) and (9.92 ± 0.38)-fold larger, respectively, than the mean power of the other spectral bands (corresponding to spectral power leakage,  , | ≠ ) for the two designs with   = 9 and 16, indicating that these spectral-spatial cross-talk components produced by our diffractive QPI design also minimally affect its signal contrast.We also created an alternate version of these confusion matrices resulting from a global normalization, using the total optical power received across the entire sensor signal region () for all the spectral bands, which is provided in Figure S4, Supporting Information, further supporting the same conclusions.
We should also note that the QPI signal output power at different wavelength channels for both of the diffractive imager designs demonstrates a negative correlation with the illumination wavelength, as shown in Figure 3b and Figure S3b, Supporting Information.Stated differently, our diffractive QPI processors tend to direct smaller wavelengths to their corresponding virtual spectral filter locations more efficiently than larger wavelengths; for example, the sensor pixel group assigned to the shortest wavelength channel    exhibits the strongest response.In addition, Figure 2b and Figure S2b, Supporting Information reveal that the QPI performance of the individual spectral channels in terms of PSNR and SSIM values also improves at shorter wavelengths.These observations can be ascribed to the diffraction limit of light: compared to longer wavelengths, the effective number of trainable diffractive features that shorter wavelengths can control is larger, resulting in richer degrees of freedom to obtain better diffraction efficiencies and QPI performance at shorter wavelengths.
In the analyses presented in Figure 2 and 3, we used a constant  test of 1, meaning that the input phase in each wavelength channel has a maximum phase contrast of π.To more comprehensively test the QPI performance of our diffractive imager designs, we further tested them using a different phase contrast factor ( test ) than the one used in the training ( tr,max = 1).A total of 9 different values were selected within a range of [0.2, 1.8] as the values of  test , and the same MNIST test image dataset is used for the phase encoding of the input objects.The resulting multispectral QPI images using the 9 and 16-channel diffractive designs are shown in Figure 4b and Figure S5b, Supporting Information, respectively, and their reconstruction quality was also quantified using the same SSIM and PSNR metrics (reported in Figure 4a and Figure S5a, Supporting Information) across all the target spectral bands and test objects.These analyses reveal that both the SSIM and PSNR values peak at  test = 1, which matches the maximum phase contrast of the objects used during the training stage (  tr ∈ [0.1,1] ).To the left of these peaks, where  test < 1, the multispectral QPI performance of our diffractive processors appears to degrade slightly.For instance, compared to their peak values obtained at  test = 1, the resulting SSIM and PSNR values at  test = 0.2 show a reduction of ~7.5% and ~12.8%, respectively, which can primarily be ascribed to the increased demand for phase sensitivity when resolving a 5-fold smaller input phase contrast [59] .However, when it comes to the case where the testing input phase contrast exceeds π, i.e.,  test >  tr,max = 1, the performance of our diffractive QPI designs exhibits more degradation.This becomes particularly noticeable as  test approaches 2, which is expected as the training process did not involve any input objects with a phase contrast of  test > 1, thus limiting the diffractive processor's ability to generalize to this range.
To gain deeper insights into the diffractive multispectral QPI processor's ability to resolve input object information with lower phase contrast, we performed an additional analysis by investigating the impact of the feature resolution and phase contrast of the input objects on the multispectral QPI performance of the trained diffractive networks.To standardize our tests, for the 9-channel multispectral QPI design, we created binary phase grating patterns with linewidths of 18.4 m = ~10.8μm and 9.2 m = ~5.4μm, and selected the test phase contrast parameter  test from {0.05, 0.1, 0.2}.The diffractive QPI signals resulting from these test objects using the 9-channel diffractive imager design shown in Figure 2a are depicted in Figure 5.Our results demonstrate that this diffractive imager with   = 9 can successfully resolve the test phase gratings with a linewidth of 18.4 m for  test ≥ 0.1.When using  test = 0.1, if the test phase object is changed to the one with a 2-fold narrower linewidth of 9.2 m , the diffractive QPI outputs become slightly harder to resolve for a small portion of the target spectral bands (e.g.,  1 and  9 ).When using an input phase object with  test > 0.1, our diffractive QPI design can clearly resolve spatial features with a linewidth of ≥9.2 m = ~5.4μm at all the 9 spectral bands of interest.Similarly, for the other diffractive QPI design with   = 16 target spectral bands, we also performed a spatial resolution and phase sensitivity analysis using binary phase patterns with linewidths of 25 m = ~14.4μm and 12.5 m = ~7.2μm. Figure S6, Supporting Information reveals that when using an input phase object with  test ≥ 0.1 our diffractive QPI design can resolve linewidths of ≥12.5 m = ~7.2μm at all the 16 spectral bands of interest.
These diffractive multispectral QPI processor designs reported so far were trained using a dataset of relatively simple spatial patterns, including handwritten digits and grating-like patterns.To further evaluate the generalization performance of our diffractive QPI designs on inputs with distinct spatial distributions, we conducted additional numerical tests using human Pap smear microscopy images, which exhibit substantially different spatial features compared to the training image datasets.These blinded test results are showcased in Figure 6a, revealing a decent agreement between the diffractive multispectral QPI results and the corresponding ground truth images.We calculated the image quality metrics across the entire Pap smear test dataset (see Figure 6b), where the SSIM and PSNR values were 0.863 ± 0.039 and 20.37 ± 2.20 dB, respectively.Overall, these successful external generalization test results demonstrate that our diffractive multispectral QPI design is not limited to specific object types or features, and can broadly serve as a multispectral quantitative phase imager for various objects.

DISCUSSION
The diffractive multispectral QPI framework reported in this work exhibits certain limitations.First, due to the wavelength-dependent diffraction effects, the QPI performance across the different spectral channels presents variations, with a maximum deviation of ~6% and ~8% observed in the 9-and 16-channel diffractive QPI designs, respectively.To address this issue, one can use a weight coefficient for the loss terms of the different spectral bands, which can be adaptively updated (increased/decreased dynamically) during the training stage based on the QPI performance imbalance among the spectral channels [62] .Second, as an initial proof-of-concept, our diffractive QPI designs presented in this manuscript were trained without optimization of the output diffraction efficiencies, which led to relatively low average diffraction efficiency values of ~0.79% and ~0.26% for the 9-and 16-channel diffractive QPI designs, respectively.A viable solution to increase the output diffraction efficiency of a diffractive QPI processor could involve incorporating an additional penalty term related to diffraction efficiency into the training loss function, albeit at the cost of a slight degradation in QPI performance.Third, during the actual implementation of our diffractive QPI designs, the performance of our diffractive networks can be markedly affected by misalignments and manufacturing errors.This issue can be partially mitigated by modeling these imperfections and incorporating them into our physical forward model in the form of random variations, which is referred to as the "vaccination" of diffractive networks [55] ; vaccinated diffractive optical networks are in general more resilient against fabrication imperfections and exhibit a significantly better match between their numerical and experimental results [55,58,60,61,63,67,68] .
In our diffractive multispectral QPI designs, each voxel in the desired multispectral data cube maps to a pixel in the output signal region () of the 2D monochrome sensor array; therefore, the total pixel count within  will be equal to the product of the number of wavelength channels (  ) and the number of pixels per channel (  ).The total pixel count of an image sensor array or focal plane array can be limited and hard to increase, such as in the case of infrared and terahertz parts of the spectrum.Thus, a careful balance must be considered in selecting   and   according to the needs of an application and the availability of high-pixel count image sensor arrays.Furthermore, to fully leverage the pixel count (or sensing throughput) of a monochrome image sensor array for a QPI processor design, the diffractive optical front-end must achieve imaging with an adequate space bandwidth product (SBP).To achieve QPI with a high SBP, or to perform high-resolution QPI with a larger number of spectral channels, it would be prudent to enhance the depth of the diffractive optical network and concurrently increase the total number of diffractive neurons , so that the degrees of freedom provided by the diffractive processor can be enhanced to match the spatial and spectral complexities of the target transformation [56,57,60,62,69,70] .Also, a deeper diffractive architecture (i.e., with a larger number of diffractive layers, one following another) generally provides strong benefits in terms of output power efficiency, transformation accuracy, and spectral multiplexing capability [57,64,65] .
In summary, we demonstrated the design of diffractive multispectral QPI processors to simultaneously measure the input object phase information, across a large set of target spectral bands, without the need for any specific color filter arrays and computational algorithms.These diffractive multispectral QPI processors maintain their performance and phase accuracy despite variations in the intensity of the broadband light sources used for illumination.With the selection of appropriate nano-/micro-fabrication methods, such as two-photon polymerization-based 3D printing [68,71,72] , these diffractive optical processors can be physically scaled (expanded/shrunk) to function within different parts of the electromagnetic spectrum.

Optical forward model of the diffractive multispectral QPI processors
Our multispectral QPI framework is composed of  successively positioned diffractive layers, each containing thousands of spatially coded diffractive features.In our numerical forward model, these diffractive layers are treated as thin planar elements that introduce complex-valued transmission modulation to the incident coherent optical fields.For the  th diffractive feature located on the  th layer at the spatial position (  ,   ,   ) , its complex-valued transmission coefficient can be represented as a function of the material thickness value ℎ   , which is given by: Here, () and () represent the refractive index and the extinction coefficient of the used dielectric material, respectively, which correspond to the real and imaginary parts of the complexvalued refractive index  ̃(), i.e.,  ̃() = () + ().In the numerical forward model of all the diffractive processor designs reported in this paper, we selected BK7 as the material of the diffractive layers [73] , with the refractive index curve () provided in Figure S7, Supporting Information.Since the absorption of this material within the visible spectrum is negligible, () is set to 0. The thickness value ℎ of each diffractive feature is defined as a summation of two parts ℎ learnable and ℎ base , namely: ℎ = ℎ learnable + ℎ base (7)   where ℎ learnable refers to the learnable thickness value of each diffractive feature and is constrained within the range [0, ℎ max ], and ℎ base is a constant representing the additional base thickness that acts as the substrate support for the diffractive features.In the diffractive designs of this paper, ℎ max is set as 664.5 nm, corresponding to a full phase modulation range from 0 to 2π for the largest wavelength ( 1 ).ℎ base is empirically selected as 700 nm.
To numerically model the behavior of the free-space propagation of light between the diffractive layers, we adopted the Rayleigh-Sommerfeld scalar diffraction theory, where the diffraction process can be formulated as a linear, shift-invariant operator with an impulse response.We defined the  th diffractive feature on the  th layer at (  ,   ,   ) as the source of a secondary wave, generating a complex field at  given by the following equation: where 2 .These secondary waves created by the diffractive features on the  th layer propagate to the next layer (the (+1) th layer) and are spatially superimposed.Therefore, the optical field incident on the  th diffractive feature of the (+1) th layer at (  ,   ,  +1 ) can be formulated as the convolution of the complex amplitude    of the wave field right after the  th diffractive feature on the  th layer with the impulse response function    (  ,   ,  +1 ; ).This resulting field is then modulated by the transmittance (  ,   ,  +1 ; ) of the (+1) th diffractive layer, which can be expressed as: +1 (, , ; ) = (  ,   ,   ; ) ∑       (  ,   ,  +1 ; )  (9).
In the numerical modeling of all the diffractive designs in this paper, the spatial sampling rate of the simulated complex fields is set to be half of the shortest wavelength, i.e., 0.5   = 225 nm, which also corresponds to the lateral size of a diffractive feature.All the axial distances between the adjacent layers (including the diffractive layers as well as the input/output planes) are set to ~9.39 m .

Numerical implementation of the diffractive multispectral QPI processors
A dispersive phase-only object is set to be located at  =  0 , which has a uniform distribution of unit amplitude across all the spectral bands of interest and phase profiles   (, ) that depend on the illumination wavelength   , where   ∈ ℝ   () ×

(𝛹)
. This object is illuminated by a broadband, spatially coherent source, resulting in multispectral input complex fields that can be written in the following form: (, ) =    (,) (10), where   represents the input complex field at   . can also be represented using  1 , i.e.,  1 (, ,  0 ;   ) =    (𝑥,𝑦) .From this input field  (or  1 ), the processes of diffractive layer modulation and secondary wave generation outlined in the last subsection are successively performed as the complex fields propagate through the diffractive network volume.This continues until the output plane of the diffractive processor, ultimately forming an output complex field   (, ) =   (, ,   ;   ).A monochrome image sensor array measures the output intensity distribution  within this output FOV.Without loss of generality, we assumed an ideal, flat spectral responsivity of the image sensor, and formulated  as the integration of the diffractive output intensities across different wavelength channels.The expression of  can be found in Equation 1.
According to the layout presented in Figure S1, Supporting Information, the pixels in  are spatially divided into two regions: an output signal region  and a reference signal region ℛ, which are used to calculate the normalized output signal that represents the desired quantitative phase information.The detailed spatial layout of these signal subregions is also illustrated in Figure S1, Supporting Information.The output signal region  has a pixel number of   () ×   () = (  () − 2) × (   () − 2), which can be divided into   () ×   () blocks, each corresponding to an individual spatial pixel of the input object.Moreover, each one of these blocks consists of √  × √  discrete output pixels, where each output pixel is uniquely assigned to each one of the   wavelength channels; therefore we have   () = (  () − 2) / √  and   () = (  () − 2) / √  .To obtain the quantitative phase information that corresponds to a certain wavelength channel   , a spatial demosaicing process  is performed to the output intensity values within the output signal region , which can be mathematically expressed as: where   represents the part of  located in the region ,   denotes the collection of all the pixels within  that are assigned to   , and ⌊•⌋ denotes the floor operation.With this demosaicing operation, the pixels belonging to   can be extracted to form a new image with a size of   () ×   (𝛹) .This is followed by the normalization operation using the calculated multispectral reference signal   ; see Equation 3.
For the diffractive multispectral QPI processor design shown in Figure 2a that operates with   = 9, the size of the input/output FOV is set to be ~262.96 m × 262.96  m , indicating a unit magnification system between the object and sensor planes.The input object has a pixel number of   () ×   () = 28 × 28 which is the same as the default resolution of the MNIST images, resulting in each input pixel having a size of ~9.39 m × 9.39 m .The output signal region  consists of   () ×   () = 84 × 84 pixels, resulting in each output pixel having a size of ~3.13 m × 3.13 m .In the demosaicing process, these 84×84 pixels are grouped into 28×28 blocks that individually correspond to the spatial pixels of the input object, where each block contains 3×3 pixels, corresponding to the   = 9 target spectral bands.To achieve the multispectral QPI task, we designed the diffractive multispectral QPI processor to possess 1,000 × 1,000 diffractive features per layer, resulting, in each layer, a diffractive area of ~390 m × 390 m .
For the diffractive multispectral QPI processor design shown in Figure 2a

Training loss function and image quality metrics
To train our diffractive QPI processors, we devised a normalized MSE-based loss function to penalize the structural difference between the diffractive output intensity distribution   (, ) and its ground truth counterpart   (GT) (, ) at each spectral channel   .The loss for a specific wavelength channel was defined as: .
In the training stage, the loss function used for the multispectral QPI processor is calculated by averaging the loss terms across all the   spectral channels, so that all the channels were optimized simultaneously.Therefore, the total loss function can be written as:

Figure 2 .
Figure 2. The diffractive multispectral QPI processor design with   = 9 target spectral bands.a, Thickness profiles of the trained diffractive layers.b, SSIM and PSNR values of the diffractive QPI processor outputs at different target spectral bands.

Figure 3 .
Figure 3. Spectral cross-talk analysis for the outputs of the   = 9 channel diffractive multispectral QPI processor design shown in Figure 2a.Output image matrix illustrating the

Figure 4 .
Figure 4. Analysis of the impact of the input object phase contrast on the image quality of the diffractive multispectral QPI results using the   = 9 channel design shown in Figure 2a.SSIM and PSNR values of the resulting QPI measurements () as a function of the phase

Figure 5 .
Figure 5. Spatial resolution and phase sensitivity analysis for the   = 9 channel diffractive multispectral QPI processor design shown in Figure 2a.Images of the binary phase grating patterns encoded within the phase channels of the input object, along with the resulting diffractive output QPI signals () at the target spectral bands.The grating has a linewidth of 18.4 m , and the phase contrast parameter ( test ) of the input phase object is selected from {0.05, 0.1, 0.2}.The

Figure 6 .
Figure 6.Results for testing the external generalization performance of the   = 9 channel diffractive multispectral QPI processor design using blind testing images from a new dataset composed of Pap Smear images.a, Examples of the input (ground truth) phase images at different spectral bands, which are compared to their corresponding diffractive QPI measurements. test = 1 was used in the testing.b, SSIM and PSNR values of the diffractive multispectral QPI processor outputs at different target spectral bands.
, Supporting Information with   = 16 spectral bands, the size of the input/output FOV is set to be ~153.39m × 153.39 m .The input object has a pixel number of   () ×   () = 12 × 12, which is slightly smaller than the default resolution of the MNIST images, resulting in each input pixel having a size of ~12.52 m × 12.52  m .The output signal region  consists of   () ×   () = 48 × 48 pixels.In the demosaicing process, these 48×48 pixels are grouped into 12×12 blocks that individually correspond to the spatial pixels of the input object, where each block contains 4×4 pixels, corresponding to the   = 16 target spectral bands.The diffractive processor is designed to possess 784 × 784 diffractive features per layer, resulting, in each layer, a diffractive area of ~307 m × 307 m .
and   represent the normalization factors used to normalize the energy of the output image   from the diffractive processor with regard to  (GT), i.e.,