Moment‐Based Shape‐Learning Holography for Fast Classification of Microparticles

Compact computational in‐line holography based on deep learning is an attractive single‐shot approach to image microparticles dispersed in 3D volume. The particle shape contains valuable information for species classification, but the dataset acquisition process for network training suffers from labor consuming and low efficiency due to the complex and varied 2D shapes of different particle species. Herein, a moment‐based shape‐learning holography (MSLH) is proposed where the shape of a microparticle is mathematically characterized using varying weights of Zernike moments. By decomposing and recombining feature shapes of pollen particles into numerous new characteristic shapes as incremental data, MSLH improves the efficiency of dataset preparation. The depth‐encoded shape‐learning is achieved using a U‐net with self‐attention mechanism, which enables fast axial depth determination. The shape reconstruction uses a wavelet‐based method with more explicit physical meanings, making MSLH a hybrid data‐and‐model driven approach that requires fewer primary data. Validation results show that MSLH achieves high accuracy in axial position and shape reconstruction, while maintaining good classification effectiveness. It is believed that MSLH is an easy‐to‐setup, efficient‐to‐construct, and fast‐to‐output approach for shape‐based classifications of 3D distributed microparticles in dynamic fluid.

DOI: 10.1002/adpr.202300120 Compact computational in-line holography based on deep learning is an attractive single-shot approach to image microparticles dispersed in 3D volume. The particle shape contains valuable information for species classification, but the dataset acquisition process for network training suffers from labor consuming and low efficiency due to the complex and varied 2D shapes of different particle species. Herein, a moment-based shape-learning holography (MSLH) is proposed where the shape of a microparticle is mathematically characterized using varying weights of Zernike moments. By decomposing and recombining feature shapes of pollen particles into numerous new characteristic shapes as incremental data, MSLH improves the efficiency of dataset preparation. The depth-encoded shape-learning is achieved using a U-net with self-attention mechanism, which enables fast axial depth determination. The shape reconstruction uses a wavelet-based method with more explicit physical meanings, making MSLH a hybrid data-and-model driven approach that requires fewer primary data. Validation results show that MSLH achieves high accuracy in axial position and shape reconstruction, while maintaining good classification effectiveness. It is believed that MSLH is an easy-to-setup, efficient-to-construct, and fast-to-output approach for shape-based classifications of 3D distributed microparticles in dynamic fluid.
Benefitting from the development of artificial intelligence, the retrieval algorithm based on deep learning is a high-potential candidate to simplify the retrieval process and further improve the computing speed. Many researches on learning-based holography for 3D distributed particles have been reported in recent years. [25][26][27][28][29] The compact setup named lensless in-line holographic microscopy (LIHM), which benefits from the large space-bandwidth product, also enables us to image the shape of 3D distributed particles. The LIHM has been successfully applied in phase imaging, 3D imaging, super-resolution imaging, on-chip microscopy, and others. [30][31][32][33][34][35] These researches have revealed that the neural network has huge advantages to solve nonlinear and complex inverse problems retrieving particle information from single-shot hologram. However, an obvious defect of this method is that the training samples could not be prepared conveniently, due to the training of the neural network requiring large number of samples, in which the targets of truth value and the input sets of raw holograms are both needed. An improved method that can quantitatively characterize the shape and resolve the data acquisition problem is urgently required.
In this article, we propose a moment-based shape-learning holography (MSLH) to overcome the mentioned shortages and realize compact in-line holographic microscopy for shape imaging and classification of 3D diffused microparticles. Zernike moments are adopted to character the particle shape; thus, arbitrary shapes can be generated by differentiating the weights of different moments. [36][37][38][39][40] Different species of pollen particles are specifically studied, whose shapes are decomposed and recombined via Zernike moments to generate numerous virtual characteristic shapes. Through this process, the preparation of shape-based data has been remarkably simplified. Besides, instead of traditional LAF, a depth-encoded shape-learning network is constructed with self-attention U-net in the MSLH to simultaneously predict the depths of 3D diffused particles with fast speed. The subsequent wavelet-based algorithm is used to realize the shape reconstruction, indicating that the whole process is a hybrid data-and-model-driven approach with more explicit physical meanings than the end-to-end learning methods. The prediction accuracy of axial position and reconstruction quality of shape profile are both quantitatively analyzed, and an open-source Yolo network is adopted at the terminal to verify the classification effectiveness.

Principle of the MSLH
The optical setup of MSLH was a compact inline holographic microscope, as shown in Figure 1a, in which a light-emittingdiode (LED, Thorlabs M470F3, central wavelength of 470 with 20 nm broadening, 17.2 mW) with partial coherence was adopted as the light source. It was extended and collimated to vertically illuminate on the sensor of a camera (LUCID PHX122S-MNL with pixel size of 1.85 μm and area of 4024 Â 3036 pixels). The particles were distributed in a compact transparent channel located in the collimated optical path, a few millimeters away from the sensor, and formed a digital hologram I holo on the camera sensor. The information of particle shape was contained in the complex amplitude U of wavefront near the particle surface. To retrieve it, the easiest approach was the angular spectrum method (ASM), where the holography process was approximately simplified to a diffraction problem. The retrieved U is expressed as The FFT and FFT À1 present Fourier transform and the inverse one, x and y are lateral coordinates, u and v are corresponding spatial frequency, and z is the axial coordinate which presents the distance between particle and sensor. The H in Equation (1) is the transfer function of light in free space and expressed as Hðu, v, zÞ ¼ expðikz where λ is the wavelength, k is the wavenumber, and i is the unit of imaginary. Equation (1) shows that the retrieved U is a function of z, the distance relative to the sensor. When the value of z approaches Figure 1. Schematic diagram of the proposed MSLH. a) The setup of the in-line holographic system for imaging 3D diffused particles. b) The computational shape-reconstruction and shape-based classification process. The shape-learning network is a self-attention enhanced U-net, which is trained via decomposition and recombination of Zernike moments. The reconstructed shape image is encoded with depth information via grayscale. The Yolo classifier is utilized at the terminal to verify the classification effects.
www.advancedsciencenews.com www.adpr-journal.com the actual axial location of the particle, the U near particle edge becomes sharper, which can be utilized to image the particle shape and locate the focal plane of the particle. [24] This computational process is relatively complex and time-consuming for the particle field which contains large number of particles. To overcome this obstacle, we constructed a depth-encoded shape-learning network, which is a U-net enhanced by selfattention (SAU-net), as shown in Figure 1b. The pretrained SAU-net can fast predict the axial depths and shapes of all particles. However, the complex nonlinear relationship between the predicted shape and the hologram is based on limited data instead of physical law, which means the predicted shape is not a rigorous one and may bring greater challenges to the shapebased classification. Hence, only the predicted axial depth was adopted as the useful information, since the relationship between axial depth and the hologram is relatively simple in theory and can be more accurate than the shape-prediction.
Although we can use the Equation (1) to retrieve the shape image of particles, the U contains the complex amplitude on particle surface as well as the conjugate term and zero-order term, which means the reconstructed shape image of the particle is interfered by the twin-image. This phenomenon is unfavorable for the subsequent shape-reconstruction. Hence, we used another wavelet-based method with the property of low-pass filtering to reconstruct the amplitude of wavefront, which is expressed as [23] Iðx, y, zÞ ¼ 1 À I holo ðx, yÞ ⊗ ψ z ðx, yÞ where ψ is the wavelet function, of which the definition and expression can be seen in ref. [23]. Although the shape reconstruction based on Equation (3) lost some detailed information of high frequency, it maintained the characteristic information of particle shape, suppressed background noise such as zero-order image and twin-image, and possessed fast reconstruction speed. Based on Equation (3), as well as the predicted axial depth from the SAU-net, we directly implemented the shape reconstruction. In the acquired shape image I(x,y,z), the value of z was encoded with grayscale value. At the terminal of MSLH, we used an open-source Yolo network to verify the classification of three pollen particles, that is, pine pollen, peach pollen, and corn pollen, as shown in Figure 1b. Through the whole process, we could see that the networks, including the SAU-net and the Yolo classifier, were the key points of MSLH and required large amount of data for training. Before dataset preparation, the problem that how to quantitatively characterize the shape information of a particle should be addressed first.

Characterization of the Shape
Different from particle size and other simple parameters, particle shape is a complex 2D character that varies a lot between different particle species. The method of moment is an effective and rigorous mathematical representation for the geometrical characterization. Here we used the Zernike moments Z nm to carry the features of particle shape image. Considering the actual condition that the particles have random orientations in the particle field, polar coordinates (ρ, θ) are adopted, and the shape profile f is expressed as [37] f ðρ, θÞ ¼ where n is the order of the moments and polynomials and m is the integer no more than n. The V nm are Zernike polynomials in a complete orthonormal set and expressed as where R nm is defined as ðÀ1Þ s ½ðn À sÞ!ρ nÀ2s Equation (4) is inversible, and the Zernike moments Z nm can be obtained from the shape profile and expressed as Zernike moments are a kind of global feature descriptor with rotation invariance, so they are perfectly appropriate to extract the shape features of randomly orientated particles, with advantages of highly centralized feature converting and information redundancy alleviating. [36][37][38][39][40] For microparticles in the same species, their shapes are usually similar and Zernike moments can be utilized to generate virtual shapes with the same characteristics. Here we propose a method of random weights to realize the generation of numerous virtual shapes f s , which are expressed as where the subscripts of s and p are the serial numbers of the generated shape and mother shape respectively, and the η pnm are weight coefficients assigned to each Zernike moments. The values of η pnm randomly range from 0 to 1 for each n and m and satisfy the condition that the sum of all the η p equals to 1. In the calculation on Zernike moments, the amount of orders N to be calculated should be considered, since a larger N can enhance the reconstruction effect while consume more computation time. An appropriate N should be selected for specific application situation. In this article, pollen particles were used for the classification verification. Specifically, three species of pollen particles with different geometric morphologies were adopted, that is, pine pollen, peach pollen, and corn pollen. Their mother shapes and the generated virtual shapes under different N are shown in Figure 2, in which the mother shapes were reconstructed via wavelet-based algorithm and binarization. In the preceding process, the pollen particles prepared on a thin slice whose distance away from the sensor were obtained via LAF algorithm. We can see in Figure 2 that the number of N significantly impacts the generated shape images. Fewer orders of N = 40 give rise to feature loss, while the shapes of N = 60 and N = 80 retain the features of the 3 pollen species. In order to distinguish the 3 pollen species, meanwhile save the computational cost, we selected N = 60 as the order numbers to be calculated.
With Equation (8), we can generate numerous virtual shapes with the same characteristics to the mother shapes. Specifically, for each species of pollen particle, we acquired 10 mother shapes in experiment, in which 3 were randomly selected to be decomposed into Zernike moments and input into Equation (8). After the recombination, 100 new shapes for each species were synthesized with slight differences, corresponding to the fact that the shape images of different pollen particles in the same species also had slight distortion in practice. It should be made aware that the trained network would be overfitting if only using limited experimental data (here is, 10 mother shapes), because only the mapping relationship between these experimental samples and corresponding outputs was learnt but not the general mapping relationship. In this condition, the generalization ability of the network would be unsatisfying. To address this issue, numerous feature shapes were generated from a small amount of experimental data in proposed MSLH. These large number of virtually generated shape images along with a small amount of experimental mother shapes, together constituted the training target of the following shape-learning network.

Shape-Learning Network
In Figure 2 we can see that the shape image of a pollen particle contains characteristic information for its species' classification. In order to realize fast and single-shot shape imaging of 3D distributed particles, we constructed a depth-encoded shape-learning network as shown in Figure 3. It mainly consists of a U-shaped network, whose key feature is the self-attention block introduced at the end of the encoder. The self-attention block aims to focus on the information of interest such as the particle shapes and locations instead of the whole hologram in the training process. [41] In the practical particle field, there are always many unconcerned smaller impurities such as broken particles, fine dusts, and bubbles. The holographic patterns formed by these particles, as shown in the input image in Figure 3, raise background noise and affect the prediction and classification effect of the networks. With the self-attention mechanism, we set more weight to the holographic patterns of concerned particles with feature shapes and less weight to the relative faint holographic patterns formed by the smaller impurities to improve the prediction accuracy.
The encoder and decoder of the shape-learning network are composed of four convolution blocks and four deconvolution blocks respectively. Each convolutional block contains two constructions, and each construction consists of a convolution layer, a batch normalization layer, and a ReLU layer. Then a Maxpool layer is used to downsample the results and output to the next block. The size of convolution kernel is set as 3 Â 3 pixels, and the stride is set as 1 pixel (s = 1). The batch normalization layer is used to constrain the data within the range of 0-1, aiming to prevent disappearing or explosion of gradient and further improve the stability of network training. The ReLU is used to realize nonlinear activation operation.
Results of the decoder would enter the self-attention block which consists of three branches, namely key, query, and value.  . Framework of the depth-encoded shape-learning network (left) and dataset preparation process (right). The network is a self-attention U-net, in which the structures of self-attention block, convolution block, and deconvolution block are given in details. The holography on the right is the simulative dataset preparation process where actual conditions such as the partial coherence of illumination and the noise from impurities are considered. The hologram and shape image in dataset preparation process correspond to the input image and output image of the network.
www.advancedsciencenews.com www.adpr-journal.com The similarity weight was obtained by multiplying the query with each key and then normalized using the softmax function. Weighted sum of the result and the value were acquired as the attention value. The output channels of query and key were set to be 1/8 of the input channels of the attention block. The deconvolution block contains three constructions and each one has the same structure with that in convolution block, as shown in Figure 3. In each block, the size and dimension of the input was adjusted via the upsampling layer and the convolutional layer, and then it entered the next two constructions. At the same time, the skip-connected output of the encoder with the corresponding size also input into next two constructions. The scale factor of the upsampling layer was set as 2, the convolution kernel size was set as 3 Â 3 pixels, and the stride was set as 1 pixel. Finally, a convolution block with kernel size of 1 Â 1 pixel was used to adjust the output size to the original input size. The specific network parameters are shown in Figure 3.
The input and output of the SAU-net were the digital holograms of a particle field and the predicted mask, respectively, in which the axial depth information of the particle was encoded in the mask with grayscale. Network training of the SAU-net requires a great deal of holograms and truth value masks. Since practical particle field is dynamic and complex, the axial positions and morphologic information of the particles cannot be obtained and the truth values remain unknown. Although they can be obtained via the LAF algorithm, the manual parameters need to be adjusted in the process and the results are affected by particle concentration and other factors. [25] Besides, the acquisition of experimental data was always labor-consuming and low-efficiency, which brought challenges to the dataset preparation in experiment. Benefitting from the mathematical characterization method mentioned above, we can generate numerous simulated data from small amount of experimental data via the decomposition and recombination of Zernike moments, as shown on the right side of Figure 3.
The experimental data was obtained via the compact in-line holography system as shown in Figure 1a. Microparticles of three pollen species, that is, pine pollen (provided by Yier Apiary), peach pollen (provided by Promanor), and corn pollen (provided by Miman Apiary), were used in experiment. We diffused the particles in a microfluid channel and captured ten full-field raw images (4024 Â 3036 pixels) for different samples, which were cropped into 150 subimages of 256 Â 256 pixels. The samples were constrained in a microchannel in the axial range of 1-3.55 mm away from the CMOS sensor. The range was divided into 255 equidistant intervals with the step size of 10 μm, and the n th interval corresponded to the n th gray value in 255 gray levels. The captured holograms were processed by artificial threshold adjustment and LAF algorithm to obtain the exact axial position of each particle. The axial position was encoded as the corresponding gray value and assigned to the shape profile to form a depth-encoded mask. About 600 pairs of raw holograms and the corresponding depth-encoded masks were generated by data enhancement such as rotation.
Based on the experimental data, other 19 400 pairs of simulated data were generated as following process shown in Figure 3. First, a random shape mask (256 Â 256 pixels) of specific pollen was generated from the composition of Zernike moments. The grayscale was randomly chosen in the range from 1 to 255, mapping to the axial range of 1-3.55 mm with 255 equidistant intervals, with the aim to match the experimental system. Then, several shapes (three in Figure 4) were overlain and randomly distributed in one mask via rotation and translation. Some fine impurity particles with the size of 1-3 pixels were added into the background as noise to improve the authenticity. Finally, with the axial position and ASM, the holograms of corresponding www.advancedsciencenews.com www.adpr-journal.com particles were simulated. During the simulation process, partial coherence of light source was considered, corresponding to the LED used in the experiment. At last, total number of 20 000 pairs (600 pairs in experiment and 19 400 pairs in simulation) of holograms and the corresponding depth-encoded masks were obtained as the training dataset of the SAU-net. The training process is equivalent to the mathematical optimization problem and the loss function can be used to evaluate the difference between predicted depth-encoded shape image I pr and the truth one I tr . The Loss function is expressed as LossðI pr , I tr Þ ¼ kI pr , I tr k 1 þ γkI pr k 1 (9) where ||.|| 1 represents norm of l 1 , which is conducive to sparse optimization. [42] I pr is the predicted result and the I tr is the target label of the network. The first term in Equation (9) represents the difference between I pr and I tr , and the second term aims to make I pr close to zero to reduce background noise in the region outside the particle. The coefficient γ is a hyperparameter, which was set as 0.01. The Adam optimizer is used to optimize the network parameters. [43] The batch size was set as 16 and the epoch was 50. The strategy of gradual learning rate was adopted and the initial learning rate was set as 0.0001. The learning rate decayed 0.8 times per 10 epochs. The training platform was Intel(R) Xeon(R) Silver 4210R CPU @ 2.40 GHz, NVIDIA GeForce RTX 3080 10G, and the training time was about 4 h.

Axial Accuracy
As elaborated above, although the shape-learning network can fast and accurately predict the depth-encoded shape images, only the axial depth is utilized for subsequent rigorous shape reconstruction. In order to evaluate the performance of the SAU-net in axial depth prediction, we construct a simulated particle field in which 15 pollen particles are randomly distributed on a slice with the same axial position, as shown in Figure 4. Four planes of different distances of 1200, 1800, 2400, and 3000 μm away to the sensor are considered specifically. The image size is set as 1024 Â 1024 pixels and the other parameters are the same with those used in the dataset preparation. The corresponding hologram is also generated under the same method. Then the hologram is cropped into 49 subimages with the same size of 256 Â 256 pixels, corresponding to the input size of the SAU-net. These subimages have overlap areas which can be used for splicing. The outputs from trained SAU-net are respliced to obtain the predicted mask with the same size to the initial mask. The predicted mask is also encoded with axial position via grayscale. Figure 4 shows the scatter plots of predicted positions in 3D view in which the results of SAU-net and two LAF methods are given for comparison, including Sog-based LAF and recently published Sobel-based LAF. [24,44] We can see that the retrieved depths based on LAF methods fluctuate dramatically, while the predicted axial positions from SAU-net are closer to the truth value for all the four distances, which shows that the SAU-net is robust in axial position prediction. Quantitatively, for each distance, we construct 6 random particle fields and obtained the statistics in Table 1. Two indexes are used for the accuracy evaluation, that is, mean absolute error (MAE) and root mean square error (RMSE), of which the definitions and expressions can be seen in ref. [45]. For different pollen species, we can see that the results of corn pollen are best for both the SAU-net and LAF methods, which may result from corn pollen particle's simple morphology. For peach pollen and pine pollen with more complex geometric profiles, the results are slightly poorer. The maximum prediction error of SAU-net is 43 μm on pine pollen at z = 1200 μm, while that of Sog-LAF is 120 μm on peach pollen at z = 1800 μm. Although Sobel-LAF method outperforms the Sog-LAF, it is still inferior to the proposed SAU-net, especially under large axial depth condition. Benefitting from the complex nonlinear fitting ability of the network, the difference between the axial distances predicted by SAU-net method and the truth is smaller, which proves that the shape-learning network gains high accuracy in axial depth prediction. If a further accurate prediction is desired, to best of our knowledge, some fine-tuning methods such as adjusting parameters of the network, introducing additional training data, and optimizing the loss functions can be used.

Shape Reconstruction Performance
The shape profile of an individual particle is recovered from its initial holographic pattern via wavelet-based method as Equation (3) expressed. Specifically, experimental results of the particles in different pollen species are shown in Figure 5, containing the retrieved binarized shape profiles from the MSLH, the recovered images from ASM as expressed in Equation (1), and the microscopic images of corresponding species. As shown in Figure 5, both MSLH and the ASM can effectively reconstruct the 2D shape and yield good agreement with the microscopic image, indicating that our method is effective for shape reconstruction.
In order to quantitatively analyze the shape reconstruction effect of proposed MSLH, we used structural similarity (SSIM), RMSE, and peak signal-to-noise ratio (PSNR) as the evaluation indexes to describe the similarity between the recovered binarized shape image I MSLH and the truth value I tr . [45,46] These indexes are calculated from the equations expressed as PSNRðI MSLH , I tr Þ ¼ 10 Â log 10 ð1=RMSE 2 Þ where μ and σ are the mean and variance of an image in grayscale. σ MSLH,tr means the covariance of the two images indicated by the subscripts. C 1 and C 2 are two constants. N x and N y are pixel numbers of the image in the row and column. These three indexes have different meanings and value ranges. SSIM is used to evaluate the similarity between the two images and its value ranges from 0 to 1. The closer the SSIM is to 1, the more similar the two images would be (i.e., better effect). The RMSE is used to evaluate the difference between the two images. The closer the RMSE is to 0, the smaller the difference between the two images would be (i.e., better effect). The PSNR is an indicator to measure the signal-to-noise ratio of the recovered image. The larger the PSNR, the higher the signal-to-noise ratio of the recovered image would be (i.e., better effect). The calculation speed is a promising distinct advantage of the MSLH. In order to demonstrate this point, particle fields with increasing concentrations are considered. Specifically, 5, 15, and 25 particles are selected in a particle field respectively, indicating three different concentrations. Since the truth value of a particle is unattainable for dynamic 3D diffused particle field, the evaluation is in simulation as shown in Figure 6. The particles Figure 5. Shape recovery effects of the MSLH (left column, binarized) and ASM (middle column). The microscopies (right column) of the corresponding pollen species are given for contrast. The feature shapes of three species of pollen particles are given, proving the identification and classification abilities of proposed MSLH. Figure 6. Shape recovery results of particle fields, in which 5, 15, and 25 particles are contained, respectively, representing increasing concentrations. The binarized shape images from proposed MSLH and those from two LAF methods are given for comparison, as well as the truth images and holograms. The error images are the difference between recovered images and corresponding truth images.
www.advancedsciencenews.com www.adpr-journal.com are also randomly selected from 300 simulated pollen particles generated via moments recombination, and their axial positions are randomly assigned within the interval from 1 to 3.55 mm away from the sensor, with the axial spatial resolution of 0.01 mm. Other parameters and data processing methods are the same with those in previous section. Figure 6 gives an example of the results, in which the recovered images from the two LAF methods are also given for comparison, and the error images display the absolute value of the direct subtraction between recovered image and the truth value image. Compared with LAF methods, it can be seen in Figure 6 that the reconstructed shape of our method is closer to the truth value image and maintains a better reconstruction ability even when the concentration increases. Statistically, ten different particle fields for each concentration are randomly simulated and the mean values of SSIM, RMSE, and PSNR are calculated and given in Table 2. The results of our method and two LAF methods are given for comparison, as well as the total calculation time. We can see that the SSIM, PSNR, and RMSE of proposed MSLH all perform stronger than those from Sog-LAF and equivalent to Sobel-LAF under each concentration. The essential highlight of MSLH is the calculation speed has been promoted to be several times faster than the LAF methods. This improvement benefits from the large amount of data, by which the nonlinear relationship between hologram and depth has been better learnt in the pretrained SAU-net. Another interesting phenomenon we find in Table 2 is that, along with the particle concentration increasing, all indexes' performances decline instead. The impact of higher concentration is the mutual interference between particle holograms, leading to worse prediction of the axial position and further decreasing the shape reconstruction performance. The measurable concentration is associated with many factors of the system, such as parameters of the network, loss functions, datasets, and so on. In the dataset preparation, we set the upper limit of the particle number to be 3 in a field of 256 Â 256 pixels, so the highest concentration is about 48 particles in a full field of 1024 Â 1024 pixels. Whereas, the influence of the particle concentration is relatively lower on our method than both Sog-LAF and Sobel-LAF methods, addressing our approach's robustness. All the statistics in Table 2 are the averages from ten tests with the image size of 1024 Â 1024 pixels. The operation platform is 11th Gen Intel(R) Core (TM) i5-1135G7 @ 2.40 GHz.

Classification Results
The proposed MSLH realized the shape reconstruction with high-quality features. Specifically, the three species of pine, peach, and corn pollen bioparticles with "love shape", "triangle shape", and "round shape" are given for examples. These three pollen species are selected as they are respiratory allergen, implying a potential public health application prospect. Their shape profiles possess intuitive characters for the identification, while the manual classification is error prone and labor consuming. In order to verify the classification effect of proposed MSLH with the all-computational method, the widely used Yolo v5 network is introduced at the terminal of MSLH. The framework refers to the open-source code on github. [47] The epoch number is set as 300 and the other parameters remain unchanged. In the training process, depth-encoded shape images and the corresponding labels of three pollen species input into the network. The dataset consists of 200 images with the size of 1024 Â 1024, and the concentration is set as 15 particles/frame. Thus, 1000 particles of each species are adopted for network training. The particles are dispersed in distilled water. The size distribution is in the range between 20 and 50 μm. The temperature and humidity of the experiment are %20°C and %74%. The platform is the same to that used for SAU-net, and the training time is about 0.7 h. Finally, the trained classifier is tested with an experimental hologram as shown in Figure 7.
It should be noted that the algorithm-associated spatial resolution of the computational imaging system is mainly constrained by the pixel size of camera, that is, 1.85 μm. [48] Since the size of pollen particles is about several tens micrometers, and the shape type is simple, the imaging system is sufficient to obtain the feature shapes. In the experiment, the three kinds of pollen particles are diffused in the microchannel with 3D distribution. The raw 2048 Â 2048 pixels hologram is captured as shown in Figure 7a, and the corresponding reconstructed shape image from the MSLH is given in Figure 7b, which then inputs into the Yolo classifier and outputs the classification results as shown in Figure 7c,d. The results contain different color boxes for the three species, provided by the confidence rate of classification.
We can see in Figure 7 that all the three species are well recognized and classified. The average confidence rates of pine, peach, and corn pollen particles reach 94.90%, 93.68%, and 93.04%, respectively. Moreover, the classification reasoning time of such a 2048 Â 2048 pixels field is only 0.393 s, which indicates that the proposed MSLH enhanced by Yolov5 classifier can reconstruct high-quality feature shapes and fast classify characteristic particles. The dataset and network of the proposed MSLH can be downloaded in ref.
[49]. The confidence error increases from many reasons, such as noises from impurities and bubbles. The perspective problem in shape reconstruction is also an important reason. The particles in experiment are randomly dispersed in water, so they have random orientation angles and the captured 2D projected holograms deform, just like a circle deforming into an ellipse at oblique perspective. This phenomenon may affect the identification of the object to some extent, which also explains why the recognition rate cannot reach 100%. Whereas, in fact, this phenomenon has no expected influence on the classification of species, because the shape features remain, especially for the particles with particular shapes. For example, for the pine pollen particle with "love shape", its 2D projected shape varies at different perspectives, but the "love" feature of the shape remains at most perspectives.
Benefitting from the single-shot reconstruction ability, proposed MSLH is adoptable to analyze dynamic particles flowing in any airborne and aqueous medium. The maximum measurable velocity of the medium flow is determined by ability of the imaging system, including light source, exposure time, camera noise, and so on. In this work, a continuous light source is used, and the minimum exposure time of the camera is 266 μs. Under the condition that the particle streaks to form a one-pixel tail (1.85 μm), the velocity of the particle is about 7 mm s À1 , which is the maximum measurable velocity of the flow. The capable 3D range that can be imaged is determined by the area of camara sensor (4024 Â 3036 pixels, 1.85 μm pixel À1 ). Within the channel depth of 3 mm, it is about 83.6 mm 3 . The maximum measurable concentration range is about 48 particles in a full field of 1024 Â 1024 pixels (a volume of 7.2 mm 3 ), that is, about 6.7 particles per mm 3 .

Conclusion
In this article, we propose a compact single-shot computational imaging system namely MSLH for the fast shape reconstruction of microparticles diffused in 3D space and the classification of them. The depth-encoded shape-learning network in MSLH is a U-net with self-attention mechanism, which helps to quickly and accurately obtain particles' axial depths from the hologram. In the network construction, Zernike moments are utilized for the shape feature characterization and extraction. By decomposing and recombining the Zernike moments of particle shapes within different species, the dataset acquisition for network training advances to be efficient and convenient to operate, during which only a small amount of experimental data is required. The subsequent shape reconstruction algorithm in MSLH is based on wavelet transform, which indicates that the whole process of MSLH is a hybrid data-and-model-driven approach which has more explicit physical meanings. Results of particle fields with increasing concentration demonstrate that the MSLH possesses strong robustness, short computation time, and high accuracy in shape recovery of 3D diffused particles. The open-source Yolov5 network is introduced at the terminal of MSLH to verify the classification effects, during which the samples of three pollen species, that is, pine, peach, and corn pollen particles are tested. The results prove that the particles can be successfully classified with good confidence rate. All mentioned improvements demonstrate that the MSLH is an easy-to-setup, efficient-to-construct, and fast-to-output approach for shapedbased classification. We believe that the MSLH will be a promising method used for identification, recognition, and classification of 3D diffused microparticles in dynamic fluid and may have bright application prospects in various fields including air quality detection, marine environment monitoring, and biological system diagnosis.