Near‐Infrared InGaAs Intelligent Spectral Sensor by 3D Heterogeneous Hybrid Integration

The applications of near‐infrared spectroscopy (NIRS) are limited due to the bulky size, low integration, and poor intelligence in edge computing of traditional spectrometers. In this work, the authors develop an on‐chip InGaAs intelligent spectral sensor, which consists of a linear variable filter for wavelength selection, a linear focal plane array as detector and a chip processor for neural‐network inferring, based on an advanced 3D heterogeneous hybrid‐integration. More than 200 spectral channels with a spectral resolution of 1.25% central wavelength are acquired in a wide waveband from 900 to 1700 nm. Immediate results are provided onsite for users, especially applicable to nonspecialists, owing to the embedded algorithmic model in‐sensor. As a proof of applications, the authors experimentally detect adulterating green tea and achieve a real‐time identification with accuracy better than 90%. This spectral sensor with edge‐artificial‐intelligence breaks the dependence on external algorithms and paves the way for NIRS into miniaturization, integration, and intelligence, which brings more possibilities in incorporating with the consumer devices and the Internet of Things.

nanowires. [11,13] But for a given grating and detector array, the resolution is proportional to the optical path length, which causing a contradiction between size and performance. In parallel, miniaturized FT interferometer and Fabry-Pérot (FP) interferometer have been fabricated using microelectromechanical systems (MEMS) technology. [1] Without spatial dispersion, the MEMS-based FT spectrometers collect spectral information at one detector, which offers a high light throughput and a smaller and more cost-effective alternative structure. Tunable FP filterbased microspectrometers have the similar advantages. [12] However, both of them have the deficiencies of reliability and scanning period due to the extra moving parts. Besides, another strategy of on-chip spectrometers or the so-called spectral sensors have been proposed, which directly integrate the narrowband filter arrays on photodetectors to achieve a snapshot acquirement of spectrum. Various filter schemes based on thin-film FP cavity, photonic crystals, and metasurfaces have been demonstrated. [11][12][13][14][15][16][17][18] Linear variable filter (LVF) is a notable example, which has extremely compact and rugged structure with no moving parts. Integrated with array detector, LVF-based spectral sensor can provide a good optical throughput and the capacity to achieve a very short time of spectra acquisition. [1] In addition, the reconstruction algorithms such as compressed sensing and machine learning have also been introduced in the computational spectral sensors to achieve the spectral super-resolution. [16,18] Unfortunately, such spectral sensors are primarily investigated based on the CMOS in the visible region, and they are also non-intelligent without integration of data analysis.
Compared to the scaling-down of size, on-chip integration of AI analysis models in spectral devices is more significant for nonspecialist users. By reducing unnecessary data movement, an intelligent spectral sensor has the advantages of low latency, high security and low data bandwidth. [8][9][10] Recently, CMOS image sensors with AI processing capabilities has been developed by Sony Semiconductor Solutions and applied in the visible region. [7] To achieve a balance between the size and performance, the conventional readout mixed-signal circuit and AI digital circuit were fabricated in different manufacturing processes and integrated with 3D stacking structure. This indicates that the intelligent sensors are experiencing the transition form study to application. But unfortunately, to our best knowledge, the desired intelligent spectral sensors especially for NIRS remain largely under studies. The key challenge lies in, although the in-sensor computing-memory technique has recently achieved a rapid progress in CMOS systems, the detect-and memorize materialsin near-infrared are typically limited. Although some researches propose to deploy the spectral analysis models in the cloud or smartphone and serviced by wireless communication, all the algorithms in this way are still physically separated from the hardware of NIRS. [19][20][21][22][23][24] Therefore, realizing a nearinfrared intelligent spectral sensor with edge AI still remains a valuable research to be investigated for the spectroscopy community.
InGaAs photodetectors play an important role in NIRS for their excellent performance from visible to short-wave infrared at near-room temperature. [25][26][27] In this work, we demonstrate a near-infrared intelligent spectral sensor through monolithic integration of edge AI analysis module and an InGaAs spectral sensor. By embedding an extrinsic chip processor into a 256 Â 1 InGaAs focal plane array (FPA) integrated with aLVF, through the advanced 3D heterogeneous hybrid-integration approach with system-in-package, more than 200 spectral channels with a spectral resolution of 1.25% central wavelength are achieved in 900-1700 nm in one snapshot. Raw spectra and immediate identification results are optional for different operators and applications. As a proof of concept, we experimentally detect the adulterating green tea and achieve a real-time identification with accuracy better than 90%. The InGaAs intelligent spectral sensor we studied here can synchronize spectral sensing and identification computing in numerous NIRS applications based on the miniaturized spectrometer where the indicative results must be acquired on-site.
In the following sections, the principle of proposed InGaAs intelligent spectral sensor is illustrated first. Then, the basic performance of this sensor is introduced and compared with a commercial InGaAs portable spectrometer. At last, as the validation of practicality, an identification experiment of adulterated tea is conducted with this spectral sensor and its on-chip analytical model. The spectral acquisition, AI model training, on-chip model deploying and real-time identification are demonstrated in detail.

Principle of InGaAs Intelligent Spectral Sensor
The proposed InGaAs intelligent spectral sensor consists of three parts, the LVF with selectively transmitted wavelength, InGaAs FPA to acquire the corresponding spectral signal, and the AI chip with embedded analysis model. Figure 1a schematically describes the structure and working principle of the proposed spectral sensor. The applied LVF is a thin-film filter deposited with an energetic physical vapor deposition process, and its central wavelength linearly changes with position from 900 to 1700 nm. [28][29][30][31] The applied linear InGaAs FPA employs a hybrid structure, in which the photodiode array used as the photosensitive area and CMOS readout integrated circuit (ROIC) are implemented in separate chips and vertically mounted by using bumps. Since the wavelength range of InGaAs FPA matches up to the working region of LVF perfectly, no additional cut-off filter is required to suppress the bypass band. The pretrained analysis model of NIRS is deployed to an STM32F767 series microcontroller with Arm Cortex-M7 RISC core, which is embedded into our spectral sensor for on-chip AI inferring. In addition, some supplemental accessories are designed for diffuse reflection measurement specifically, including a collimator based on a plano-convex lens, four halogen bulbs used as light source, a 3D-printing packaging with sapphire window, and the test kit with white reference (WR), samples and dark background (DBG).
Since LVF and photosensitive component were manufactured with non-Si materials, 3D structure of heterogeneous hybridintegration was designed for vertically stacking and connecting various materials, technologies, and functional components together. This third dimension allows extending Moore's law to ever higher density, higher functionality, higher performance, and more diversified materials and devices to be integrated with lower cost. [32,33] Moreover, this structure also gave us a flexible choice for different CMOS manufacturing process in ROIC and AI chip. Specifically,we directly bonded LVF onto the InGaAs FPA with supporting structure, aiming at an extremely compact and rugged spectral sensor with no moving parts. [34][35][36][37] The AI chip is then embedded into our spectral sensor by a customized printed circuit board (PCB). All components above are assembled into a metallic package with a size of 55 Â 30 Â 15 mm 3 and a total weight of less than 100 g.
On-chip deployment of AI analytical model based on NIRS is a crucial part during the development of our intelligent spectral sensor. As illustrated in Figure 1b, on-chip AI working flow can be divided into five steps: i) raw spectra collecting from the training set and validation set; ii) neural-network training and model files saving; iii) model optimizing and translating for on-chip deploying; iv) compiling and on-chip embedding of the reshaped model; and v) on-chip inferring of the test set. Considering the scalability of edge hardware platform, we adopt TensorFlow and Keras as deep learning frameworks during modeling. X-CUBE-AI, an AI expansion toolbox provided by STMicroelectronics, is used to translate model files to chip-compatible C libraries, which can be compiled and burned to ARM core for edge inferring.

Spectral Performance
Before the practical applications, we first studied the actual spectral performance of proposed spectral sensor. Figure 2a,b shows the normalized spectral responses of partial pixels to monochromatic light between 900 and 1700 nm, with a typical bandpass feature of Lorentz-Gauss distribution and a cut-off less than OD2. Each pixel of the spectral sensor has a unique central wavelength, which is determined by the relative position of LVF and FPA. The exact central wavelength and full width at half maximum (FWHM) of each pixel can be acquired and calibrated by Lorentz fitting, which was studied in detail during our previous work. [38,38,39] Since the effective region of LVF is longer than 10 mm and the pitch of FPA is 50 μm, we obtain more than 200 available spectral channels with significant consistency, as shown in Figure 2c, which benefits from the uniformity of LVF in the spectral range from 900 to 1700 nm. Figure 2d describes the FWHM of all available pixels. The spectral resolution roughly increases with central wavelength λ p and spectral accuracy of our spectral sensor was experimentally determined with a standard reference material (2035a) certified by NIST, and compared with the result measured by a Nicolet 6700 commercial FTIR spectrometer. As shown in Figure 2e, two absorbance curves www.advancedsciencenews.com www.adpr-journal.com of 2035a are exceedingly coincident on X-axis, and the deviations from the band locations are indicated in the curves. The wavelength accuracy of all six peaks in 900-1700 nm is better than 2 nm for our spectral sensor, and 0.5 nm for the FTIR spectrometer. Considering the significant disparity in resolution and SNR between the two devices, this is still a quite approving result. As for the difference on Y-axis in absorbance, it is a normal experimental appearance mainly caused by the difference in resolution (≤0.09 cm À1 for Nicolet 6700) and instrument performance. The basic performance of this spectral sensor can be on a par with a commercial InGaAs portable spectrometer and the detailed comparison is shown in Table S1, Supporting Information. [35,36]

Spectral Acquisition of Adulterated Tea
Green tea has been used widely and in high doses for centuries as a health tonic in many countries. In order to improve the flavor and lustrousness of tea products, illegal producers are likely to spray sugary solution during tea processing. Excessive sugar not only confuses quality identification but also brings some potential food safety problems, such as moisture absorption and bacteria breeding. In the fast customs clearance of exported green tea real-time detecting techniques of sugar adulteration are needed urgently to strike a balance between accuracy and efficiency. In this article, mixtures of sugar and tea with different www.advancedsciencenews.com www.adpr-journal.com proportions were classified by our spectral sensor and on-chip identification model. As a preliminary validation of this experiment and method, the quantitative regression was performed to the similar sucrose-doped tea in our early work, [40] using a FTIR benchtop spectrometer IRTracer-100 from Shimadzu and a traditional BP neural network for modeling. We simulated the spectra using the data acquired from FTIR spectrometer, with the wavelength range and resolution of the spectral sensor, to obtain a better verification effect.
In spectral acquisition of this experiment, all samples were averagely divided into six groups with sucrose content of 3%, 8%, 13%, 18%, 23%, and 28%, labeled by high-performance liquid chromatography (HPLC). The reflectivity of sugaradulterating green tea was acquired by this spectral sensor with sampling accessories and calculated on chip processor, according to the equation where S Sample , S DBG , and S WR are the signals of sample, DBG and WR, respectively. Since the analytical model has not been trained and integrated, the spectral sensor can only output the original spectra of reflectivity at this stage. Figure 2f shows the raw spectra of 120 samples in training set and validation set with different sucrose content. The spectral curves of these samples show the typical plant characteristics of green tea, which are consistent with our early work and those spectra measured by commercial instruments in other studies. [40][41][42][43][44][45][46] The reduction of spectral sampling points would result in the decrease of identification accuracy, which also reflects in the smoothness of the curves. Fortunately, we found that a spectral resolution of 10-20 nm is sufficient for the qualitative analysis of sugar-adulterating green tea in our previous work, and the characteristic bands mainly fall into 1300-1700 nm. [40] The baseline drift of spectra is mainly caused by the instability of halogen bulbs as the tiny light source, which can be improved during data processing. Additionally, although all the samples were dried before grinding, the absorption peak of moisture located around 1200 and 1450 nm are still obvious. Thus, all the spectra were collected quickly and simultaneously to minimize the impact of moisture absorption.

Modeling Results and Edge Identification
In this section, we want to prove that a deep learning spectral analysis model deployed at the edge (integrated in sensor) can achieve the same effect as traditional algorithms, which were deployed on the computer. An identification model of adulterating green tea based on convolutional neural network (CNN) is first established and validated on the computer. After that, the verified model is further deployed onto chip processor, which is integrated with the InGaAs spectral sensor, for the edge identification of adulterating green tea. As shown in Figure 3a, we designed a one-dimensional CNN with 12 layers called 1D-CNN-12, [47][48][49] including input layer, convolution layers, pooling layers, full connection layer, and output layer. The input of 1D-CNN-12 is the raw spectra of green tea samples in training set, with a dimension of (200,1). Because the value of reflectivity from 900 to 1700 nm is in the range of 0.4 to 0.8, no extra data preprocessing is needed before inputting, which is helpful to reduce on-chip computing. The convolution kernels used in convolution layers C1, C2, C4, C5, C7, and C8 are 4 (with a size of 2), 8 (with a size of 4), 16 (with a size of 3), 32 (with a size of 2), 64 (with a size of 3), and 64 (with a size of 2). The activation function is tanh. Maximum pooling is used in pooling layers S3 (with a size of 2), S6 (with a size of 4), and S9 (with a size of 2). Both convolution layers and pooling layers are set up with padding and a stride of 1. After flattening, the output of 1D-CNN-12 is implemented through a full connection layer with an activation function of softmax. We adopted the dropout method with a rate of 0.1 to prevent over-fitting during the training phase. The epochs are set to 600 with a batch size of 8. Figure 3b shows the training accuracy and loss of 1D-CNN-12. In the beginning, the training loss decreases slowly and then it drops sharply after 50 epochs, which shows that the optimizer has found a more efficient optimization direction after initial attempts. In the later 300 epochs, training loss keeps decreasing rapidly with vibrating. After 400 epochs, the loss descending speed slows down and gradually remains stable near zero. At this time, the training accuracy is close to 100%, indicating that the model training process can be ended. The proposed 1D-CNN-12 contains 20 682 trainable parameters and can be trained within 1 min.
Validation set is used to verify the recognition ability of 1D-CNN-12 on computer and the identification results are demonstrated by the confusion matrix in Figure 3c. Because the spectra in validation set were not used in model training, their identification results are more objective to the evaluation of model. The vertical coordinate and horizontal coordinate of confusion matrix represent the real category of the sample and the prediction category of the model respectively. The prediction accuracy of all 36 spectra in validation set was 94.4%. For the sucrose contents of 3%, 8%, 13%, and 18%, the prediction results were completely correct. A 23% sample was mistakenly identified as 28%, and a 28% sample was mistakenly identified as 23%. The experimental results indicate that the identification accuracy of 1D-CNN-12 decreases when the sucrose concentration is larger than 20%. It is possible because the exorbitant sucrose proportion would influence the inference of model. Taken overall, although two samples with high sucrose content were confused in identification, the prediction results were still close to their real categories, which is coincident with our early research. In the application of fast customs clearance, if a sucrose content of 10% is set as the threshold of adulteration, the proposed 1D-CNN-12 can be regarded as a reliable model.
As the final validation of practical application, we deploy the trained model into the spectral sensor for real-time analysis. [50][51][52] As shown in Figure 3d, 1D-CNN-12 has been optimized and simplified to nine layers by X-CUBE-AI tools before deployment for edge inferring. All the one-dimensional convolution layers in 1D-CNN-12 are converted to 2D ones after translating. Moreover, convolution layer and its following pooling layer are combined and integrated to reduce the number of layers. The complexity of translated model can be indicated by the number of multiplyand accumulate complexity (MACC), which is 633 622 for reshaped 1D-CNN-12. To ARM Cortex-M7 core used in our work, every MACC consumes six cycles roughly, thus we can estimate that the time consumption in inference is 17.6 ms for a basic frequency of 216 MHz. The Flash and RAM required for deployment of 1D-CNN-12 are 80.79 and 10.18 KB, while 1024 KB of Flash and 512 KB of RAM are available for the selected chip. Model compression was not performed in view of sufficient hardware resources. With random numbers in X-CUBE-AI, we performed off-chip verification and on-chip verification of the generated C model. The verification results suggest that the generated C model is 100% consistent with the original 1D-CNN-12.
On-chip identification with instantaneous and indicative results was conducted with test set and the experimental results are illustrated in Figure 3e. The prediction accuracy of all 12 samples was 91.7%, and only one 28% sample was mistakenly identified as 23%. Limited by experimental materials, the number of samples in test set is not too large. But the on-chip identification results are still consistent enough with the phenomenon of validation set which was inferred by 1D-CNN-12 on computer. The results demonstrate that the deployment of edge AI model is www.advancedsciencenews.com www.adpr-journal.com successful and our assumption to intelligent spectral sensor is tenable.

Conclusion
In conclusion, we develop an InGaAs intelligent spectral sensor by embedding an ARM Cortex-M7 chip processor with AI analytical model into a spectral sensor based on LVF and linear FPA. Based on the 3D structure of heterogeneous hybrid-integration, more than 200 spectral channels with a spectral resolution of 1.25% central wavelength are obtained in 900-1700 nm in one snapshot, with optional output of raw spectra or immediate identification results. As a proof of edge AI applications, adulterating green tea were experimentally detected by this spectral sensor with real-time identification accuracy better than 90%. Our intelligent spectral sensor breaks the dependence on external algorithms of NIRS, which can provide instantaneous and indicative results for in situ analysis and field measurements. Additionally, the further scaling-down of our spectral sensor could be actualized by integrating the collimating optics on chip and adopting the ceramic or plastic packages. We believe that such an intelligent spectral sensor would bring much more opportunities for NIRS in scientific research and industry, such as consumer electronics and the Internet of Things.

Experimental Section
Materials: The applied LVF was purchased from Vortex Optical Coatings Integrated Assembly: InGaAs FPA employed a hybrid structure, in which the linear photodiode array and ROIC were electrically connected by indium bump. LVF was placed on the photodiode array by a support structure with a spacing of 0.1 mm. FPA and LVF were mounted on a silicon nitride ceramics substrate. DW-3 cryogenic adhesive was used for fixing in a curing condition of 45°C for 24 h. STM32F767 chip was mounted on a customized PCB and electrically connected with FPA by wire bonding. All components above were assembled into a metallic package with a size of 55 mm Â 30 mm Â 15 mm and a total weight of less than 100 g.
Sampling Accessories: A collimator based on GCL-010 818 was designed to collect incident light with an aperture of 2 mm in diameter. The detailed design of collimator was shown in Figure S1, Supporting Information. A 3D-printing packaging with a sapphire window and four halogen bulbs as light source was designed, as shown in Figure 1a. In addition, the test kit, including WR, samples, and DBG was carefully designed in size for the adaptation to sampling accessories. The SNR of spectral sensor with reflective sampling accessories was tested and demonstrated in Figure S2, Supporting Information.
Spectral Acquisition: According to the ratio of 7:3:1, all samples in the six groups were divided into training set, validation set, and test set. Raw spectra of training set and validation set were collected to train 1D-CNN-12 models on computer, while samples in test set were reserved as the validity of on-chip inferring. DBG signal was collected firstly with halogen bulbs off. Turned the bulbs on and held for 1 s, then WR signal was collected with an integral time of 3 ms. DBG and WR were separately stored in the arrays established on microcontroller chip. Finally, samples of training set and validation set were collected with the same integral time and converted to reflectivity on the chip. In particular, the surface of sample was flattened to avoid undesired spectral interference.
Model Training and Deploying: Model training was conducted on a Lenovo computer with the GTX 1660 graphical processing unit (GPU) and 16

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.