Multispectral Microscopic Multiplexed (3M) Imaging of Atomically‐Thin Crystals Using Deep Learning

Ultra‐thin atomic crystals are promising for fabricating next‐generation photonic and optoelectronic devices. Wafer‐scale characterization techniques are highly desired for efficient and accurate thickness identification of these crystals and their heterostructures. Optical contrast between atomic crystals and substrates based on Fresnel theory is a key technique for the identification of thicknesses. Both RGB color information and spectroscopic information have been explored for layer number counting and implemented with machine learning algorithms based on large amounts of data for feature extraction. In this work, a multispectral microscopic method combining the hardware design and deep‐learning algorithms is developed. Multispectral image restoration during large‐area scanning caused by optical imaging modality drifts and automated layer number identification based on multispectral grayscale images are studied using deep learning models: generative adversarial network (GAN) and 3D U‐Net. These models are trained using custom‐built multispectral data sets and evaluated quantitatively with indicators (Dice coefficient, confusion matrix, structural similarity). After these models are trained and tested, they are integrated into a graphic user interface for on‐site identification use. The developed method provides a framework using multispectral images for 3D data reconstruction and segmentation and can be implemented for wafer‐scale characterization of heterostructures containing different species of ultra‐thin atomic crystals.


Introduction
15][16] However, their temporal resolution and field of view are limited for wafer-scale searching and monitoring.Optical microscopy is the most efficient way to achieve a balance between the field of view and spatial resolution.The identification of ultrathin graphene layers based on Fresnel law has been widely utilized. [17,18][21][22][23][24] Searching and monitoring ultra-thin atomic layers are being transformed from the manual work to the well-trained neural network models, motivating the use of robotic vision. [25,26][29][30] Although this technique shows good performance when identifying a single type of ultra-thin atomic crystal with few layers, [31] it is a challenge to apply this technique to their heterostructures with several types of crystals with varying layer numbers stacked.][34] Multispectral imaging microscopy acquires more abundant spectral information than conventional RGB color microscopy, [35,36] and is thus a promising technique for the characterization of ultra-thin atomic crystals and their heterostructures.The visibility and detection of mono-and few-layer atomic crystals can be realized on the basis of the contrast of crystals and the substrate at a series of specific wavelengths.The layer number and the optical contrast can be physically described by the Fresnel theory.Once the optical contrast at different wavelengths is experimentally acquired, the layer number could be calculated.Out-of-focus images appear in large-area and wafer-scale scanning.39][40][41][42][43] However, multispectral imaging microscopy and deep learning algorithms for the efficient identification of ultra-thin atomic crystals and their heterostructures has not been widely explored.
In this work, we developed a multispectral microscopy framework including the acquisition of images at a series of wavelengths, automated single-shot multispectral image refocusing, and multispectral image segmentation using deep learning methods for multiplexing of ultra-thin atomic crystals.MoS 2 fabricated by the chemical vapor deposition method on the SiO 2 /Si substrate with the SiO 2 thickness of 270 nm was used for demonstration.The multispectral images at six wavelengths from 500 to 600 nm with a step of 20 nm in the visible range were captured, which reflected the variation of the optical contrast between the atomic crystals and the substrate.The automated single-shot multispectral image refocusing was achieved by a generative adversarial network model, and the segmentation of refocused 3D multispectral images was achieved by a 3D U-Net model. [44,45]he performances of these models in processing multispectral image data were quantitatively evaluated.By comparison, the GAN model recovered the out-of-focus multispectral data with high structure similarity to the in-focus images.The combination of the GAN model and 3D U-Net model reconstructed and segmented multispectral images acquired at varying illumination and mechanical drift conditions, demonstrating the potential for on-site multiplexed imaging with high accuracy and reliability.

Principle of Multispectral Microscopic Image Autofocusing and Segmentation
Figure 1 illustrates the workflow of the multispectral microscopic multiplexed imaging of ultra-thin atomic crystals based on deep learning methods.Figure 1a shows the diagram of the multispectral microscope, which was modified using a filter series and a monochromatic camera.The multispectral microscope can acquire the whole field of view images of the sample from 500 to 600 nm with a wavelength step size of 20 nm.The spectral information of the light source and the filters can be found in Figure S1 (Supporting Information).The sample can be moved in x, y, and z directions by controlling the stage.Figure 1b shows the acquired multispectral images of a fixed region at the wavelengths from 500 to 600 nm and the corresponding RGB image.Figure 1c elaborates the theoretical basis of layer number counting based on optical contrast.Fresnel law describes the transmittance and reflectance of light in a multilayer model constructed by air, ultra-thin atomic crystals, and the substrate.In this work, deep learning methods were utilized to automatically distinguish the layer number based on a well-trained model and a custombuilt data set.
After building the imaging setup to acquire the multispectral data, the workflow was divided into two stages.The first stage, illustrated in Figure 1d-f, mainly deals with the defocusing of multispectral microscopic images.The out-of-focus images contain blurred structure and color information and may lead to wrong predictions and misclassification of species.This can occur due to mechanical movement and changing temperatures.To deal with this challenge, the stage was axially scanned to acquire defocused images at each wavelength.The experimental data contains both out-of-focus and in-focus multispectral images.A GAN model was used to restore the out-of-focus images at specific wavelengths.The out-of-focus multispectral images could be transformed from blurred images to clear ones.The second stage, illustrated in Figure 1g-j, mainly deals with the multiplexed imaging of different species, which mean atomic crystals with varying layer numbers since the refractive indices are different.A deep learning architecture, namely a 3D U-Net model, was utilized to segment the atomic crystals with different thicknesses (monolayer, bilayer, trilayer, multilayer, and bulk) based on the multispectral images containing six spectral channels.The welltrained model was integrated into the user interface for on-site use.

Multispectral Image Correction by Nonlinear Normalization
Light illumination conditions influence the structure and color information of the acquired images, and the change in illumination conditions may cause blurred images of the sample.The multispectral images were corrected before any processing method was applied.First, a series of in-focus multispectral data of the sample was collected and used to build a data set.These images were captured in the same working conditions and utilized as inputs to train a segmentation network.Second, the newlyacquired in-focus multispectral data was corrected by linear and nonlinear methods.Third, the corrected multispectral image was put into the pre-trained segmentation network, and the prediction was evaluated to assess the performance of linear and nonlinear normalization algorithms.
Figure 2 illustrates the process of linear and nonlinear normalization and comparison of the performance.An image at the wavelength of 540 nm from a newly-acquired multispectral data was captured.When this image was directly put into a segmentation network trained using the data set containing images collected in the same condition as Figure 2b, the network would wrongly predict the distribution of different species (Figure S2, Supporting Information).The pre-trained segmentation network model worked well when taking images collected  in the same condition as Figure 2b as inputs.Therefore, both linear and nonlinear normalization methods were applied to newlyacquired images, including those in Figure 2a.Differences in histograms were found in Figure 2a,b since these images were acquired under different conditions.The normalization of images was based on the histogram analysis of the newly-acquired image (Figure 2a) and the reference image (Figure 2b).
By assuming that the brightness of the substrate should remain the same under the same light conditions, the mean pixel values of the substrate indicated the illumination conditions.To convert the dataset to a similar illumination condition as the reference set, mean shift (a linear transform preprocessing method) and gamma contrast (a nonlinear method based on power function) methods were applied to the original data.
The mean shift method was based on the assumption that the pixel value distribution of each channel could be regarded as a normal distribution.The transforming formula is: where ΔV is the difference between the mean value of the substrate of the original image and the mean value of the substrate of the reference dataset.V out and V in are the mean values of the normalized and original images, respectively.All pixel values in the image are shifted by the same offset.The histogram in Figure 2a was shifted to the histogram in Figure 2c, with the distribution of pixel values closer to the reference set in Figure 2b.The gamma correction method was performed: where  is used as a variable to adjust the contrast of the image.
In general, when the power is larger than 1, the gamma transformation makes the image lighter; when the power is smaller than 1, it renders the image darker.To adjust the contrast of the dataset similar to the reference dataset, the gamma value was calculated according to Equation 2, and then the gamma correction was applied to the whole dataset channel by channel.By a comparison of the predicted results in Figure 2e,f, the nonlinear (gamma contrast) normalization method had higher performance than the linear (mean shift) method.The pre-trained segmentation model functioned well for all the normalized multispectral images captured under different illumination conditions.

Multispectral Out-Of-Focus Image Restoration Using GAN
Precise and correct layer prediction or thickness measurement of ultra-thin atomic crystals relies on the high quality of captured multispectral images.It is a critical step to acquire high-quality multispectral microscopic images to avoid the wrong prediction of the pre-trained network models, which are trained by feeding high-quality in-focus multispectral images.Figure 3 illustrates the out-of-focus multispectral image restoration using the Pix2Pix architecture.Similar to classic GAN architecture, Pix2Pix contains a generator that generates multispectral data based on the inputs and a discriminator that determines whether the generated images are different from the ground truth images (infocus images) until the generated and in-focus images cannot be distinguished by the discriminator (Figure S3, Supporting Information). [46,47]In the training process of the GAN models, paired out-of-focus and in-focus images were fed into the network to learn and generate appropriate clear results.The in-focus images were selected as references.(Figure 3a).For generalizability, the GAN model was trained with datasets covering outof-focus multispectral images at different degrees of defocusing distances.The out-of-focus images were acquired by a motorized vertical stage to move the sample from the focal plane of the microscope objective with a step size of 2 μm.When using one spectral filter, the in-focus images were captured by scanning the defocus distances of the vertical stage.The manually selected infocus image was set as the ground truth in the dataset.The same step was repeated when using other spectral filters.
To evaluate the restoration performance of the GAN model, an out-of-focus image at the wavelength of 540 nm from the multispectral image set was evaluated.The generated image (Figure 3c) has a clear structure information compared to the out-of-focus image at the same wavelength (Figure 3b), compared to the experimentally captured in-focus image at 540 nm (Figure 3d). Figure 3e shows the quantitative results of the original images and refocused images at a single channel (540 nm).Structural similarity (SSIM) was used to measure the similarity of the synthesized in-focus images to the reference in-focus images at each wavelength. [48]For the whole axial scanning distance from -20 to 20 μm, the SSIM value of the refocused image was higher than that of the original out-of-focus image.To evaluate the performance of the two networks more generally, the mean value of the quantitative results was considered by using the SSIM of all test regions at six multispectral channels from 500 to 600 nm at different out-of-focus degrees (Figure 3f).For SSIM, a higher value indicates better performance.Subsequently, the GAN model significantly restored the structural information of the multispectral images in the spectral range from 500 to 600 nm in a defocus distance from -20 to 20 μm.The loss function was set to MS-SSIM-L1 to capture more structure information during the training process which was beneficial for recovering structural information in the prediction process of the model.

Multispectral Multiplexed Imaging Using the 3D U-Net
Multiplexed imaging indicates the segmentation of pixels belonging to different categories (thicknesses) based on the multispectral image set, which contains several channel greyscale images.Figure 4 illustrates the segmentation of the restored multispectral images at different defocusing degrees based on the 3D U-Net architecture.Figure 4a-c illustrates the process of the 3D U-Net model for layer number segmentation using the full FOV multispectral image set.The full FOV images (1024 × 768 × 6) were split into paired data sets with a smaller size (96 × 96 × 6) and predicted by the 3D U-Net, and then merged in the previous splitting sequential to reconstruct the color-coded segmentation of different categories (substrate, monolayer, bilayer, trilayer, multilayer, and bulk).The 3D U-Net model was trained using the in-focus high-quality multispectral data set, which contained 1700 multispectral image pairs with a balanced percent of the monolayer, bilayer, trilayer, and multi-layer images through data augmentation.The structure of the 3D U-Net is shown in Figure S4 (Supporting Information).
Figure 4d,e shows the performance of the pre-trained 3D U-Net model on the in-focus multispectral data set.The multiplexed imaging task was basically a segmentation task from the perspective of computer vision.Two indicators including the Dice coefficient and confusion matrix, were employed to evaluate the performance of the 3D U-Net model.The model had an average value of over 80% in all the categories (Figure 4d).Furthermore, the confusion matrix allowed for analyzing where the pixels were not classified in the right category.For example, the bilayer pixels in the multispectral images might be misclassified as mono-layer with probability of 16.9% (Figure 4e).Hence, the 3D U-Net model performed well on the custom-built multispectral data sets containing monolayer, bilayer, trilayer, multilayer, and bulk categories.This model achieved an average accuracy (Dice coefficient) of >80% in all the layer thickness categories.

Implementation for the On-Site Multiplexed Imaging
After the multispectral data normalization by nonlinear gamma correction, restoration by the GAN model, and segmentation by the 3D U-Net model, the performance of the GAN model and the 3D U-Net model for on-site multiplexed imaging use were evaluated.The image reconstruction and segmentation performances of the single 3D U-Net model and cascaded GAN and 3D U-Net models were analyzed over the full defocus range (-20 to 20 μm).
Figure 5 illustrates the multiplexed imaging of multispectral images based on implemented GAN and 3D U-Net models.The original multispectral image set (Figure 5a) was normalized by the gamma nonlinear correction and fed into the pre-trained 3D U-Net model (Figure 5b) and the cascaded GAN and 3D U-Net model (Figure 5c).The segmentation results of the multispectral image set after nonlinear normalization showed the distribution of different categories of ultra-thin atomic crystals.The segmentation result (Figure 5c) predicted by the cascaded GAN and 3D U-Net model, has more detailed structure information than that of the single 3D U-Net model (Figure 5b).To compare the detailed segmentation results, three regions (A-C) were selected in the RGB image of the same region (Figure 5d). Figure 5e illustrates the segmentation results of these three regions using a single 3D U-Net model and the combined use of cascaded GAN and 3D U-Net models over the full defocus range from -20 to 20 μm.The segmentation results using the combined use of cascaded GAN and 3D U-Net models showed clearer structures and fewer misclassified pixels.The combination of both GAN and 3D U-Net for multispectral image restoration and segmentation is a promis-ing technique for efficient on-site characterization of ultra-thin atomic crystals.

Discussion and Conclusion
A multispectral imaging microscope with deep learning techniques was developed and implemented for the efficient and accurate characterization of ultra-thin atomic crystals.The nonlinear normalization method was used to correct the illumination in the changing working conditions.A GAN model was developed to refocus the out-of-focus multispectral images, which appear due to mechanical scanning and drift, and the performances were quantitatively analyzed over the full defocus range from -20 to 20 μm.A 3D U-Net model trained with high-quality in-focus multispectral images was developed with over 80% Dice coefficient score to segment the refocused multispectral images and predict the distribution maps of ultra-thin atomic crystals with varying thicknesses.A multispectral data set with in-focus and out-of-focus images were custom-built to support the implementation of deep learning models in the study of multispectral image refocusing and segmentation.The cascaded GAN and 3D U-Net model reconstructed high-quality images from the out-offocus multispectral data, and further implemented these images for the on-site multiplexed imaging with high accuracy and reliability.The developed method can be implemented for the characterization of heterostructures containing different species of ultra-thin atomic crystals.The hardware-based autofocus methods usually implement complicated systems such as autofocus circuits, laser light sources, and additional light paths; and the software-based autofocus methods generally scan axially for depth estimation.The GAN models based on the deep neural network which generate recovered images using the input out-offocus images with the original in-focus images as ground truth requires no further hardware implementation and can well cascaded with downstream neural network models for classification and segmentation.Future work includes the combination of both GAN and 3D U-Net models into one model, and model training using multispectral images with larger sizes for higher onsite processing efficiency.1]

Experimental Section
Multispectral Microscopic Image Acquisition:: The multispectral microscope was built upon an optical microscope (DM750M, Leica).A motorized axial scanning stage was implemented to achieve a range of defocus distances.An RGB image of the sample region was obtained to determine the region to be captured.To generate multispectral images, six band filters (Thorlabs) were set in the light path of the microscope to obtain images at specific wavelengths (500, 520, 540, 560, and 600 nm).As for obtaining image sequences with different out-of-focus extents, a motorized stage (OSMS40-5ZF, OptoSigma) was mounted to achieve vertical translation.The single-axis stage controller (GSC-01, OptoSigma) allowed for translation with a resolution up to 0.5 μm.For every 2 μm, a multispectral image set was captured.For images obtained by different filters with the same stage height, the out-of-focus extent could be different.Thus, a real in-focus image set might contain images captured with different stage heights.A dataset should be composed of multispectral image sets of multiple regions with different out-of-focus extents.All images were 1536 × 2048 px in size and stored in RGB format.To summarize, MoS 2 fabricated by the chemical vapor deposition method on the SiO 2 /Si substrate with the SiO 2 thickness of 270 nm was used for demonstration.The multispectral images of the sample contain six specific wavelengths (500, 520, 540, 560, and 600 nm).The defocus range of the sample from the focal plane of the microscope objective was set from -20 to 20 μm with a step size of 2 μm.The sample of 20 regions containing 620 multispectral image sets were captured experimentally.
Network Training Details: For the GAN model training, 20 regions of the sample were selected and captured to make up the whole dataset.The original dataset contained 620 multispectral image sets, all 1536 × 2048 px in size and stored in RGB format.Each multispectral data set contains 6 images at wavelengths from 500 to 600 nm.At each specific wavelength, one in-focus image corresponds to 20 out-of-focus images at the defocus range from -20 to 20 μm with a step size of 2 μm.The large-area images of 1536 × 2048 px in size were split into images of 96 × 96 px in size for training.For the 3D U-Net model training, each multispectral data set contains six greyscale images and one segmentation mask.The original images of 1024 × 768 × 6 px in size were split into small images 96 × 96 × 6.After data augmentation, the data set for training the 3D U-Net model contains 1632 data pairs.Six categories including the substrate, monolayer, bilayer, trilayer, multilayer, and bulk were segmented pixel-bypixel using the 3D U-Net model.

Figure 1 .
Figure 1.Workflow of accurate thickness determination using multispectral imaging and deep learning algorithms.a) The diagram of the multispectral microscope, which captures the greyscale images of the sample at different wavelengths.b) The captured multispectral images at wavelengths from 500 to 600 nm.The light source was fixed with no intensity change when capturing the multispectral images.The RGB image of the same region is also illustrated.c) The diagram of Fresnel's law, which is the basis for determining atomic crystal thickness using the optical contrast of the sample and the substrate.d-f) The first stage of the workflow, which restores the out-of-focus multispectral images at specific wavelengths using the GAN network.The axial scanning of the stage covers a distance range from -20 to 20 μm with a step size of 2 μm.The distance of 0 μm indicates that the sample was positioned at the focal plane of the optical setup.g-j) The second stage of the pipeline, which segments the regions belonging to different layer thicknesses (monolayer, bilayer, trilayer, multilayer, and bulk) based on a 3D U-Net architecture.The 3D U-Net model was integrated into the user interface for on-site use.

Figure 2 .
Figure 2. Linear and nonlinear correction of the multispectral data set.a) A newly-acquired image at the wavelength of 540 nm.b) A reference image at the wavelength of 540 nm.c) The image of (a) corrected by the linear correction algorithm.d) The image of (a) corrected by the nonlinear correction algorithm.e,f) The prediction of a pre-trained segmentation network trained using a series of data captured together with data in (b).Scale bar = 100 μm.

Figure 3 .
Figure 3. Out-of-focus multispectral image (540 nm) restoration based on the GAN method.a) The principle of GAN architecture.b) The out-of-focus image from the captured multispectral data, with an out-of-focus distance of -20 μm.c) The generated image using the GAN model from the generated multispectral data.d) The in-focus image from the experimentally captured images as reference.e) Partial evaluation using the SSIM of one of the original out-of-focus images and the refocused image over a wide range of defocusing distance from -20 to 20 μm.The refocused images represent the generated images.f) Global evaluation using the SSIM of all the original out-of-focus images in the testing set and the refocused image at six wavelengths (500-600 nm).Scale bar = 100 μm.

Figure 4 .
Figure 4. Restored multispectral image segmentation based on the 3D U-Net architecture.a-c) The process of the 3D U-Net model for layer number segmentation using the full FOV multispectral image set.The full FOV images were split into paired data sets with a smaller size and predicted by the 3D U-Net, and then merged in the previous splitting sequential to reconstruct the color-coded segmentation of different categories (substrate, monolayer, bilayer, trilayer, multilayer, and bulk).d) The performance of the pre-trained 3D U-Net model for full FOV multispectral image segmentation.The Dice coefficient was used to evaluate the segmentation results of all the layer thickness categories.e) The confusion matrix of the pre-trained 3D U-Net model shows to the categories where the pixels are wrongly predicted.Scale bar = 100 μm

Figure 5 .
Figure 5. Multiplexed imaging of multispectral images based on implemented GAN and 3D U-Net models.a) The original full FOV multispectral image set after nonlinear normalization.b) The segmentation results of species with varying thicknesses using the 3D U-Net model.c) The segmentation results of species with varying thicknesses using both the GAN and 3D U-Net models.d) The RGB image of the same region in (a).e) The segmentation results of three regions in (d) using the single 3D U-Net model and the combined use of cascaded GAN and 3D U-Net models over the full defocus range (-20 to 20 μm).Scale bar = 100 μm.