Next-generation machine vision systems incorporating two-dimensional materials: Progress and perspectives

Machine vision systems (MVSs) are an important component of intelligent systems, such as autonomous vehicles and robots. However, with the continuous increase in data and new application scenarios, new requirements are put forward for the next generation of MVS. There is an urgent need to find new material systems to complement the existing semiconductor technology based on thin-film materials, and new architectures must be explored to improve efficiency. Because of their unique physical properties, two-dimensional (2D) materials have received extensive attention for use in MVSs, especially in biomimetic ones: the human visual system, which can process complex visual information with low power consumption, provides a model for next-generation MVSs. This review paper summarizes the progress and challenges of applying 2D material photodetectors in sense-mem-ory-computational integration and biomimetic image sensors for machine vision.


| INTRODUCTION
Machine vision systems (MVSs) simulate human visual functions to observe and recognize the objective world. 1,2 They can be used for image acquisition, image processing, and feature recognition. 3 With the rapid development of information integration, machine vision has become indispensable in automated production, autonomous driving, robotics, security, and other domains. 2 New application scenarios and increasing quantities of data have created a demand for MVSs with faster parallel processing, higher energy efficiency, smaller volume, and lower price. 4 Image acquisition by an MVS depends on the image sensor. At present, most commercial visible-image sensors are either charge-coupled devices (CCDs) or CMOS image sensors (CISs). 3 Infrared photosensor materials include cooled mercury-cadmium-telluride (MCT), InSb, quantum wells, Type-II superlattice or uncooled VO x , PbS, InGaAs, amorphous silicon, PbS, and PbSe. 5 In recent years, two-dimensional (2D) materials (i.e., materials with atomically thin layers) have become a powerful competitor. [6][7][8][9][10] Their extreme thinness, quantum electronic states, large carrier mobility, and localized optical transition give them many advantages over traditional materials. 11 2D materials are a large family, including metals, semi-metals, semiconductors, and insulators. 12 Because they are so thin, their band structure can be adjusted through a localized electric field or external stress; thus, their detection waveband can range from ultraviolet to terahertz. 13 Because different layers of 2D materials are bonded by van der Waals forces, with no dangling bonds, they can be easily integrated with each other or memristor circuits, regardless of lattice mismatch or deposition temperature. 11,14 Combining 2D materials with silicon chips to form a heterogeneous structure platform based on vdW forces can greatly promote vertical integration and functional diversification. 15,16 However, an MVS based on von Neumann architecture encounters difficulties when applied to new scenarios with higher performance requirements. The image acquisition, storage, and processing modules in traditional von Neumann structures are physically separated. 17 Visual information is obtained through image sensors in the MVS and stored in a storage module. To conduct further image processing requires that a large amount of image data be transferred repeatedly between the memory and the processor, resulting in high energy consumption. 18 In addition, the bandwidth of the storage module is much smaller than that of the processing module: the so-called "storage-wall problem", which limits the speed of the entire system. 19 Therefore, when implementing parallel tasks such as image processing, the von Neumann structure has low efficiency, high energy consumption, and is slow. 2,20 The human visual system (HVS) can recognize various objects and perceive visual information in a complex environment with high energy efficiency. 21 Image information is preprocessed near the retina before being sent to the brain. 22 Therefore, there is no need for an analogto-digital (AD) conversion process, which would require high power consumption. HVS-inspired machine vision preprocessing functions utilizing a 2D-material photodetector have therefore been proposed. To improve data processing capacity, speed, and power consumption further, neural networks can be combined with such novel photodetectors. A 2D-material photodetector array can be integrated with a memristor module, the resistance value of which can then be used as the weight of the neural network. More importantly, because of the 2D material's unique layered structure, the carrier density can be precisely tuned by shifting the gate voltage, and even the carrier type can be changed, altering the response range or responsivity. 23,24 The adjustable responsivity can mimic the variable weights of a neural network, allowing preprocessing to occur in the detection module and thereby increasing speed and reducing energy consumption. Detection, calculation, and storage functions can be realized in one device simultaneously. 9 This review summarizes a variety of attempts to apply 2D-material photodetectors to MVSs, including vertical integration with silicon-based readout circuits and integrated sensing, memory, and computing architecture. As a new material system, 2D materials face many challenges in the transition from the laboratory to industry. This paper considers not only the advantages and diversified functions of 2D materials but also the limitations of existing MVSs. Some possible future applications are considered in detail. This review provides a basis for the application of 2D material photodetectors in terms of materials, performance, and systems. The main challenges to practical applications are highlighted, including the preparation of large-area materials and the calibration of key performance indicators. We suggest a possible solution to the latter problem, at least for the case of normalized detectivity. Silicon semiconductor technology, the foundation of the modern electronics industry, has been developing vigorously since the 1960s. The appearance of 2D materials makes possible future development of the microelectronics industry at the atomic scale without considering lattice mismatch. Benefitting from van der Waals forces between layers, 2D materials can be stacked like LEGO blocks at the atomic level without restrictions on lattice matching. Therefore, 2D-material-based detectors can be vertically integrated with Si-based chips at high densities ( Figure 1C). Already, a 5 Â 5 MoS 2 -based detector array has been heterogeneously integrated with a silicon chip 26 ( Figure 1D). In another study, a 388 Â 288 detector array using graphene and quantum dots was heterogeneously integrated with a CMOS readout circuit with a detection waveband of 2000 nm 25 ( Figure 1A). This was the first time that high-density vertical interconnection between an infrared detector and a silicon-based chip was realized. Graphene was used as the channel of the photodetector, while PbS quantum dots were spin-coated on graphene as the absorption layer. The working mechanism of this device was photogating. The bottom silicon chip, acting as the readout circuit, integrated and amplified the signal from the detector ( Figure 1B). This type of heterostructure fabricated by dry transfer 27,28 is particularly important in the development of infrared detection, because it is not restricted by lattice matching between materials.
As described by Liu et al., 11 vdW integration can be further developed. The detector, memory, and logic circuit can be vertically integrated to realize a systemlevel vdW detection chip ( Figure 1E). This technology significantly reduces the MVS volume while increasing the pixel density. However, it is still based on the von Neumann structure and is subject to the storage wall and other problems that have not yet been solved. Future MVS systems need new architectures integrating detection, storage, and processing capabilities at the device level.

| Limitations of von Neumann architecture
In existing MVSs, light passes through a set of lenses before converging onto photodetectors, which convert optical signals into analog photocurrent signals. The readout circuit integrates the analog photocurrent signal into an analog voltage. After filtering and noise reduction, the analog voltage is converted into a digital signal by AD conversion. 29 The central processing unit (CPU) or graphics processing unit (GPU) conducts noise reduction, nonuniformity correction, graphics distortion correction, and other processing before sending the digital signal into a trained neural network for recognition. 29,30 The working process of the MVS is illustrated in Figure 2A. The architecture is inefficient F I G U R E 1 (A) Imaging system based on CMOS-integrated graphene-quantum dot photodetectors. 16 Inset: a photograph of the packaged monolithically integrated graphene-based image sensor. (B) Side view explaining the graphene photoconductor and the underlying readout circuit. 25 (C) Schematic illustration of a 5 Â 5 MoS 2 -based detector array and silicon chip. 26 (D) Image of the top monolayer TMD photodetector array. 26 (E) Detector, memory, and logic circuits can be vertically integrated to realize the level detection chip of vdW system. 11 (A) Reproduced with permission. 16 Copyright 2019, Springer Nature. (B) Reproduced with permission. 25 Copyright 2017, Springer Nature. (C,D) Reproduced with permission. 26 Copyright 2016, IEEE. E, Reproduced with permission. 11 Copyright 2019, Springer Nature in terms of power consumption and data-transfer speed, in that the memory and processing units are physically separated. 31 A near-sensor architecture adopting near-memory computing has been proposed that shortens the transmission distance between storage and computation and improves energy efficiency in traditional-film-material systems. 32 Near-memory computing frameworks have been based on cutting-edge packaging technologies, such as system-in-package (SiP), 2.5D SiP, and 3D SiP technologies. 33 There has been no attempt to introduce system-ona-chip (SoC) technology (which integrates CPU, dynamic random-access memory (DRAM), read-only memory (ROM), input/output interface (IO), analog-to-digital converter (ADC), and other modules on a chip) into imagesensor design, because the CIS and CPU use different fabrication processes. In particular, most infrared-sensitive materials, such as InGaAs, InSb, MCT, and VO x , are not easily compatible with the CMOS process. One popular method of 2.5D/3D packaging, through silicon via (TSV), encapsulates chips with different functions on one substrate, thereby significantly shortening the length of the connection between different modules and improving the connection density 34 ( Figure 2C). Sony has widely adopted TSV packaging technology in visible CIS designs 35,36 ( Figure 2B). In addition, by adopting the TSV package, the newly released infrared InGaAs product significantly reduces pixel size and improves pixel density and transmission speed, resulting in a more colorful and higher frame-frequency imaging effect. However, this technology is still not mature enough; it has a high price and low yield compared with other packaging technologies. TSV packaging for other infrared-sensitive materials is at the laboratory prototype stage, and many technical problems still need to be solved. Taking MCT as an example, defects introduced by the TSV process result in a high dark current and other problems.
In essence, near-memory computing is still dominated by the von Neumann architecture. In the 10 nm process, the power consumption of data bus accounted for more than 69% ( Figure 2E). Its transmission speed is limited by the connection density and will soon hit a bottleneck. The speed of the existing high-performance photodetector is less than 1 ns; the response speed of the ROM is on the order of milliseconds. Thus, the storage speed is much lower than the detection speed. The speed mismatch between the modules also significantly affects the recognition speed ( Figure 2D).

| The human visual system
Nature has been a source of ideas for engineers in such diverse fields as aerodynamics, robotics, surface, structural engineering, and materials science. [37][38][39] The natural HVS was developed through a long evolutionary process and may provide inspiration for solving the problems faced by MVS. It can recognize various objects and perceive visual information in a complex environment with high energy efficiency. 21 The HVS has two main advantages over existing MVSs. First, human eyes do not perform AD conversion after receiving visual information, but directly preprocess the analog signal in the retina, greatly improving the processing speed while maintaining high-enough computational accuracy for the eye to be a major source of information about the outside world for most humans. 40 In addition, image information is preprocessed near the retina before sending it to the brain, which increases the possibility of recognition and energy efficiency. 22 The human retina is composed of a photoreceptor layer, bipolar cell layer, and ganglion cell layer. The light signal is first received by photoreceptors, which are divided into cones and rods ( Figure 3A). Color determination and light collection under strong illumination are performed by cone cells, light collection under weak illumination by rod cells. The signals collected by the photoreceptors are then transmitted to the bipolar cells to generate bidirectional electrical signals that are modified by amacrine cells before being transmitted further. 41 Amacrine cells not only modify the output from the bipolar cells but also that from other associated amacrine cells, so different combinations of them perform different functions. Then, the modified information is sent to the ganglion cells (and eventually to the brain). 42 For initial integration or preconditioning in the retina, the electrical signal goes to the excitation-inhibition neural network, which is composed of intermediate nerve cells; this network realizes the center-surround antagonism function and improves the ability of ganglion cells to recognize contours. 40 The central peripheral antagonistic receptive field of ganglion cells corresponds to the difference of Gaussians in the convolutional neural network (CNN) algorithm ( Figure 3B). It can convolve the original input image to extract the outline of the image ( Figure  3C). Next, the retina categorizes and packages preprocessed information into the optic chiasm and further into the visual cortex of the brain.
The human brain is composed of many neurons connected through synapses; it implements learning and complex cognitive tasks with high energy efficiency, playing the role of a CPU. The BNN in the human brain consists of neurons rich in synapses that undertake the responsibilities of computation and storage simultaneously. 43 Thus, mimicking the HVS to integrate sensing, storage, and processing could solve the problem of speed mismatch and low energy efficiency in the von Neumann architecture. 44 The human retina can perform many preprocessing steps, including integrating visual information and extracting contour information, that greatly reduce the processing load of the brain and improve energy efficiency. [45][46][47] Photoelectric hybrid neural networks based on photonic memristors can mimic the sensing and preprocessing functions of HVS ( Figure 4A). Zhou et al. demonstrated a two-terminal photonic memristor 48 ( Figure 4B). This device had optically tunable characteristics, including long-term memory, short-term memory, and a tunable storage time affected by the written optical-pulse dose.
The nonlinear change in the resistance could function as an HDR (high-dynamic range) preprocess in image processing. The main features of the image were highlighted after preprocessing, and the image contrast was enhanced. The recognition efficiency of the algorithm was improved compared with that of the neural network without image preprocessing: after 1000 training epochs, the recognition rate with image preprocessing reached 0.986, while that without image preprocessing was only 0.980 after 2000 training epochs ( Figure 4C). Photonic memristor arrays can realize image sensing and memory functions, as well as neural morphological vision preprocessing. The proof-of-concept device improves the processing efficiency and image recognition MSV researchers have also attempted to imitate the type of pattern recognition that occurs in the retina of the human eye. 40 Bipolar cells connected to photoreceptors are classified into ON and OFF types that respond to light stimulation oppositely. This architecture, which enables the human eye to preprocess the image and transmit it to the cerebral cortex for further processing ( Figure 4D), is analogous to the use of floating gates to regulate the responsivity of an artificial detector. Unlike a traditional FET, a floating gate adds a charge-binding layer to the dielectric layer; the charge of the floating-gate layer is controlled by the impulse voltage on the control gate ( Figure 4E,F). Thus, the carrier concentration in the channel is controllable: photodetectors based on floatinggate memory could integrate detection and storage by adjusting the channel conductance. 49 Wang et al. demonstrated a bio-inspired WSe 2 /h-BN/ Al 2 O 3 vertical heterostructure that mimics the vertical integration of the photoreceptor and bipolar cell layers in the retina. 40 This device could be regulated by the backgate voltage to achieve a positive or negative photoresponse imitating that of the bipolar cell layer ( Figure 4G). An OFF-device surrounded by 12 ON-devices could mimic the biological receptive field in the HVS that detects the edges of objects. In the experiment, the lightsource array transmitting the image was turned on column by column, imitating a moving edge. As the edge moved from the left side to the right, the current increased because of the activation of more ON-devices. It reached a maximum value before the edge reached the central OFFdevice, which was later activated to produce a reverse photocurrent. The entire photocurrent output of the system decreased, resulting in an opposite photocurrent peak. The sharp change in photocurrent realized the recognition of the image boundary. Another task was image recognition without AD conversion. The central peripheral antagonistic receptive field formed by the array performed a convolution operation on the original input image to extract the contour. The CNN formed by the prototype device was used to classify the target image. Jayachandran et al. created an extremely low-power, dynamic, nonvolatile collision-avoidance monitoring system by vertically stacking a single-layer MoS 2 light detector with a floating-gate transistor-based memory device to simulate the LGMD neurons in locusts 39 ( Figure 4H). These are unique and complex neurons that use a distributed computing architecture to assist in processing visual information. The honeycomb-shaped photoreceptors in the insect's compound eye convert visual stimuli into electrical impulses that pass through the lamina, medulla, and lobules, before eventually being transmitted to the dendritic fan-out area of LGMD neurons. The dendritic branches in the red area receive feedforward suppression; those in the blue area receive feedforward excitation and lateral suppression. The system developed by Jayachandran et al. used positive feedback from the light detector to the light signal and negative feedback from the floating gate to simulate this natural computational architecture. The photodetector responded to the impending object (stimulus signal) and caused the device current to increase, while the underlying programmable memory stack always caused the current (inhibition signal) to decrease. When an object approached, an excitatory signal was added to the inhibitory stimulus; this caused a non-monotonic change in the device current, simulating the escape response of LGMD neurons in the locust. Thus, the system could detect an impending collision in time and trigger an escape response with nanojoule energy consumption. This in-memory task-specific computation and perception method preprocessed the raw data, greatly simplifying the computation and reducing the energy consumption.

| In-sensor computing for machine vision systems
Because of their vdW bonding, 2D materials allow arbitrary stacking to form 3D high-density integration with an electronic memristor array. Connecting 2D photodetectors with a memristor array is an effective way to improve energy efficiency. A memristor is a type of nonvolatile memory, the resistance value of which can be changed nonlinearly by an external stimulus 48,50,51 (Figure 5A). The adjustable resistance can be used as the weight of the neural network. It can simultaneously process and store information that would require a dozen CMOS transistors in matrix computing.
Wang et al. proposed a neuromorphic visual system composed of sensors and memory networks mimicking the hierarchical organization and biological functions of the retina. 4 Figure 5B shows a flowchart and processing of the neuromorphic visual system. The sensor was constructed using a WSe 2 /h-BN/Al 2 O 3 vdW heterostructure detector with the ability to adjust electrically between ON and OFF states. Some common kernel functions in a CNN algorithm were realized by adjusting different detectors into the ON/OFF state in a 3 Â 3 bionic retinal sensor (Table 1 summarizes commonly used convolution kernels and their corresponding ON/OFF states). A variety of image-processing operations, such as uselessinformation filtering and key-information extraction, were realized using different convolution kernels. However, the kernel function represented by the sensor was not adjustable during its operation, unlike the dynamic kernel function in a normal CNN algorithm. Moreover, the extracted optoelectronic signal still required transformation into a digital signal before being output into the memristor crossbar network. Although the packaging was not specified in the paper, it seems more likely to have been PCB than SiP. In contrast to von Neumann structure-based processors, the memristor can process analog information directly without AD conversion. The neuromorphic system accomplished excellent crosstracking by networking the retinomorphic sensor with a recurrent neural network in a fully analog signal environment.
Commercial silicon-based photodetectors show a fixed responsivity that is determined by the structure and doping concentration inside the device. The physical properties of 2D materials are easily regulated by external stimuli because of their atomic layer thickness. 52 The applied voltage controls the carrier concentration and polarity of the 2D materials ( Figure 5C). Because of the unique layered structure of 2D materials, the carrier density can be precisely tuned by shifting the gate voltage. Even the carrier type can be changed, resulting in a change in the response range or responsivity. A homogeneous WSe 2 P-N junction has been realized by regulating the local carrier density in different WSe 2 areas using a double gate. 24 Responsive-regulated regulated P-N diodes based on this homogeneous junction were obtained. The responsivity of the photodiodes could reach 210 mA W À1 , and the corresponding energy efficiency conversion was 0.2%. The adjustable responsivity could mimic the variable weights of a neural network. This means that the new  9 Copyright 2020, Springer Nature Publishing AG photoelectric detector can integrate the processing function into the sensing module, thereby improving visual processing speed and reducing power consumption.
If some physical parameters of the photodetector, such as the resistance of an electrical memristor, can be dynamically regulated by an external voltage, a neural network can be formed directly by the photodetector array. Mennel et al. designed photodiode arrays based on 2D materials, providing a new architecture to replace the traditional von Neumann architecture for image recognition. This enabled the implementation of more in situ calculations on the detector 9 ( Figure 5D). As a photodiode array, the device itself constituted a neural network that could recognize and process images simultaneously, breaking through the limitations of traditional separation modules. This significantly improved the speed and efficiency of intelligent recognition. Double-gate electrodes were used to dope channel materials electrically, resulting in an adjustable responsivity for each pixel device ( Figure 5E). Through the artificial neural network, the system executed supervised and unsupervised learning tasks ( Figure 5F). The responsivity of each unit served as the weight of the neural network. The responsivity was constantly adjusted to maximize the system's identification accuracy. Because both image sensing and image processing were carried out in an analog signal environment, the speed of the system was limited only by the optical-electrical conversion process. Therefore, image detection and recognition occurred within 50 ns. Theoretically, the system could process 20 million images per second, a rate several orders of magnitude faster than that of traditional methods. In addition, compared with current MVSs, which consume a large amount of power per operation, the biological neural network had very low energy consumption per operation: 10 À15 to 10 À13 J.

| CHALLENGES OF TWO-DIMENSIONAL MATERIAL ARRAY DETECTORS
Photodetectors using 2D materials may enable the development of novel photoelectric sensors for energy-efficient computing and in situ recognition and of curved image sensors with remarkable volume advantages, as will be discussed below. However, commercial viability has not yet been attained. Some technical issues still need to be addressed, including the stable growth and doping of large-scale crystals, standardization of evaluation methods, and development of large-scale heterogeneous integration techniques. In this section, we discuss the obstacles to material growth and device performance, as well as possible ways to overcome them.

| Large-area two-dimensional materials and photodetector arrays
Large-area materials are indispensable for large-scale integrated optoelectronic devices. Although small samples of 2D materials can be easily obtained by mechanical exfoliation, the growth of large-area high-quality 2D materials remains a major challenge. Because the thickness of the single layer of 2D material, unlike that of a conventional thin film, is of nearly atomic scale, growth depends on the substrate surface. Multilayer 2D materials, while lacking some of the advantages of single layers, have good absorption of light and are widely used in the field of photodetectors, but the van der Waals force between the layers is weak, and the growth process is difficult to control. 53 The growth method with a controllable number of layers is also a challenge currently faced.
Many factors may influence the growth of highquality large-area 2D materials, such as elemental ratio, surface defects, nucleation processes, growth catalytic techniques, and phase control, require further investigation. Because of the immature state of growth technology, defects are produced in the material: vacancies, substitutions, anti-sites, and adsorbed atoms. 54 These have a significant influence on the electrical and optoelectronic properties of 2D materials: they can act as carrier donors, scatterers, traps, and recombination centers under different conditions. 55 In the preparation process, it is necessary to avoid most defects, but sometimes beneficial defects can be used to achieve special functions. Therefore, a systematic understanding of 2D material growth is necessary for the future development of detector arrays. The construction of high-quality heterogeneous structures incorporating large-area materials is another challenge to be faced in the future. Because of the stringent requirements for substrate quality in the growth process of 2D materials, 53 it is necessary to eliminate potential excess nucleation points before growth. In the early stage of growth, graphene has been grown on the surface of Cu foil, and the active parts of the foil surface have been passivated through high-temperature, long-time annealing 56 or electrochemical polishing 57 to achieve millimeter-sized growth. Oxidization of growing substrates to eliminate potential nucleation points has been widely used and intensively studied, pushing the size of graphene domains up to the centimeter scale. [58][59][60][61] By using Ni catalysis and CH 4 gas to induce epitaxial growth, it is even possible to prepare meter-scale single-crystal graphene. 62 However, graphene is a semiconductor with a zero bandgap. Graphene's poor absorption limits its applicability in photodetection. Therefore, graphene is generally used in heterostructures. Goossens et al. used large-area graphene and PbS quantum dots to form heterostructures and realized a 388 Â 288 near-infrared camera 25 (Figure 6A). Quantum dots acted as the photosensitive layer, while graphene, because of its high mobility, became a carrier fast-transport channel. 25 When photoexcited, free electron-hole pairs were generated in the quantum dots; the holes migrated to the graphene, whereas the photogenerated electrons stayed in the quantum-dot layer, where they could potentially regulate the graphene channels. 63 As a conductive channel with high mobility, graphene is easily affected by an external electric field; as a photosensitive absorption layer, PbS quantum dots compensate for the weakness of weak light absorption and realize near-infrared detection by improving the gain of the photogating effect ( Figure 6B).
Because of their excellent electronic and optical properties, transition-metal dichalcogenides (TMDs) are promising materials for next-generation electronic and optoelectronic devices. The large-area growth of TMDs can be achieved through chemical vapor deposition (CVD), depositing metal precursors before sulfurization or selenization. This can produce centimeter scale, atomically thin, and uniform MoS 2, 67,68 NbSe 2, 69 and PdSe 2 70 crystals. In addition, pulsed laser deposition (PLD), 71 atomic layer deposition (ALD), 72 and metal-organic chemical vapor deposition (MOCVD) 73 have been used to grow large-area materials. For example, in the case of appear during synthesis because of the special triple symmetry of the lattice, resulting in double grain boundaries during the splicing process. 74 Therefore, the main practical obstacle is the accurate control of the unidirectional arrangement of MoS 2 domains on the wafer. The successful growth of large-area photosensitive materials has greatly promoted the development of detector arrays based on 2D materials. Yang prepared a 5 Â 5 MoS 2 photodetector array with a responsivity of 25 A W À1 at 637 nm. 26 Jang prepared a 32 Â 32 MoS 2 photodetector array 64 ( Figure 6C) and realized the internal neural network calculation by using the continuous photoconductance of MoS 2 ( Figure 6D); this system successfully recognized handwritten digits. Tan et al. fabricated a 42 Â 42 Se x Te 1Àx array as a 2D-material camera for infrared imaging 65 (Figure 6E,F).
Large-area 2D materials are also prone to forming heterojunctions because of vdW forces. Zeng et al. used controlled growth of PdSe 2 combined with Si to realize a 4 Â 4 photodetector array, achieving highly sensitive detection performance at 780 nm. 70 An 8 Â 8 detector array has been realized by growing large-area PtSe 2 combined with Si; a detectivity of 1.26 Â 10 13 Jones was achieved in the near-infrared band at 810 nm. 75 2D materials are excellent candidates for curved image sensors, because their layered properties permit them to bend. 2D material detectors on flexible substrates can cover a broad spectral range from UV to IR. 76 A hemispherical structure can be assembled by bending a flexible substrate. 66 Therefore, many research studies have been conducted on 2D flexible detectors. 76 Choi et al. applied a MoS 2 /graphene heterojunction photodetector 66 to an artificial retina in an aberration-free singlelens imaging system, as shown in Figure 6G. A flexible photodetector based on a graphene/MoS 2 vdW heterojunction achieved a photo-responsivity of 45.5 A W À1 . 77 The photodetector array and ultrathin neural-interfacing electrodes were integrated on a flexible printed circuit board to form a soft-implant optoelectronic device, as shown in Figure 6H. Live experiments on rats verified the effect of the device on the retina. The flexible device successfully responded to external light pulses and detected both spikes in the rat visual cortex and local field potential changes on the spikes. This was the first time that a bionic eye based on 2D materials was realized. Table 2 summarizes the scale and performance of various photodetector arrays based on 2D materials.
Moreover, black phosphorus 80 (bP), a narrowbandgap semiconductor material, is widely used in roomtemperature mid-infrared detection because of its high mobility and 0.3 eV bandgap. 10 Although CVD can be used to synthesize bP from bottom to top, only a few layers of bP thin films with a transverse size of tens of microns can be obtained. 81,82 Wu et al. reported a controlled PLD method that can directly synthesize at the centimeter scale a few layers with high crystallinity and high homogeneity. 83 Molecular dynamics (MD) simulation results show that the pulsed laser, unlike conventional thermal-assisted evaporation, promotes the uniform distribution of bP clusters in the physical vapor, thus reducing the formation energy of the bP phase and realizing the large-scale growth of bP thin films with fewer layers. A cm-level field-effect transistor (FET) array based on a few-layer bP film was fabricated. It exhibited good electrical characteristics in terms of carrier mobility and current switch ratio: the carrier mobility reached 213 cm 2 V À1 s À1 at 295 K and 617 cm 2 V À1 s À1 at 250 K, performance comparable to that previously reported for mechanically stripped bP. However, a bP photodetector array has not yet been reported. The optoelectronic properties of large-scale bP films cannot be compared with those of bP films obtained from mechanical exfoliation.

| Accurate characterization of quality factor of 2D-material photodetector
Photodetectors are classified as photoconductive or photovoltaic, depending on their structure. The photoconductive type is easy to manufacture and requires a current drive, but its dark current is higher; the photovoltaic type can work under zero bias conditions and has low noise, but its manufacturing process is more complicated. The performance quality factors of the two types of photodetectors are also different. Therefore, it is important to standardize the evaluation methods. Although the photoelectric properties of 2D materials are almost always reported to be excellent, these test data are frequently obtained through nonstandard measurement methods, often using nonirradiation light sources such as lasers. The active area of the device is often also not clearly defined. Nonstandard measurement methods have seriously hindered the development of photodetectors, making it impossible to compare measured performances with those of commercial devices. Therefore, an accurate characterization method for the quality factor of a 2D photodetector is included in this paper.
2D material photodetectors often perform poorly in low-light detection. Most devices fail to detect photo signals in standard measurements using blackbody light sources. The responsivity and detectivity of some devices will drop by two or three orders of magnitude in standard measurements. Taking the specific detectivity of an important index of the infrared detector as an example,  the specific detectivity of the commercial thermistor is 10 8 Jones. Meaningful research work should be higher than the detectivity of the thermal detector. Therefore, the calculation of noise should also be standardized. The dark current should be carefully used to calculate the noise equivalent power (NEP). One of the important indicators of the photodetector is the responsivity (R), 84 which is the ratio of the output photocurrent I ph to the input optical power P: At present, this quantity is usually measured by either the monochrome laser test or the mixed-light blackbody test. 85 Although the monochromatic property of the laser test is good (Figure 7C), the spot has a Gaussian distribution in the infrared stage and its size cannot be accurately measured, resulting in a calculation error of several orders of magnitude ( Figure 7A). We believe that the uniform blackbody test should be the main test in the infrared band ( Figure 7B). The device is located at a distance L from the aperture that can vary with the slide. The total incident power on the device surface can be calculated using the formula where α is the modulation factor, ε is the average emissivity of the blackbody radiation source (ε = 0.9 for our source), and σ is the Stefan-Boltzmann constant. T is the temperature of the blackbody radiation source, and T 0 is the room temperature (300 K). A is the area of the blackbody radiation source, and A n is the device area. The photodetector absorbs the incident photons to generate a voltage signal, which is converted by the current preamplifier to obtain the final current signal I. The current response rate R of the device is I/P. To obtain more accurate current signals, a chopper is introduced into the test optical-path diagram to modulate the radiation spectrum, and a phase-locked amplifier reads the signal to filter system noise and background noise. From this, we can obtain the blackbody responsivity.
To obtain the relationship between the responsivity of the detector and the radiation wavelength, a spectral response test is required. In this study, grating and Fourier transform infrared (FTIR) spectroscopy were combined for testing. In general, the blackbody response test obtains the blackbody responsivity R b and blackbody detectivity D* (blackbody), and the FTIR test obtains the relative responsivity R 0 (λ). The ratio of the blackbody responsivity and the peak responsivity of an infrared detector is constant, which is the g factor. Thus, by calculating the factor g, we can obtain the peak responsivity R (λ p ) and peak detectivity D* (λ p ). 86 To calculate g, we first note that by definition Because blackbody radiation has a continuous spectrum and the emissivity of each wavelength is different ( Figure 7D), the signal produced by the photodetector is the sum of the signals produced by each wavelength of radiation: where ϕ (λ) is the blackbody radiation power distribution. Then the g factor is The response spectrum obtained by the Fourier spectrometer is only a relative response spectrum, and the response rate of the current spectrum of the photodetector can be obtained after blackbody-response calibration. The relationship between the peak current photoresponse R(λ p ) and the blackbody photoresponse R b of the photodetector is as follows ( Figure 7E): The normalized detectivity (D*) is defined as the detector output signal-to-noise ratio (SNR) generated by unit incident power when normalized to a unit photosensitive surface and unit noise equivalent power (NEP); its unit is the Jones (i.e., cm Hz 1/2 W À1 ). 87 The best performance can be obtained when the detector and amplifier noise is low. Under these conditions, the main noise source of the detector is the discrete radiation field in the detection process, and the main noise is photon noise. This limiting value of detector performance is called the background limit. From the particle noise formula, the BLIP (background limited infrared photodetector) D* of the photovoltaic and photoconductive types at 300 K can be deduced, 87 as shown in Figure 7F. In principle, the BLIP is the performance limit of the detectors (Figure 7F).
We also calculated the D* of some low-dimensional material detectors for comparison. D* is usually expressed as follows: Here, i n is the noise current, which is different for photoconductive and photovoltaic detectors, Δf is the integration time (1 s). The gain of photosensitive diodes based on 2D materials is usually higher than 1. In the dark state of the photoconductive detector, the two main noises are thermal noise (i t ) and composite noise (i gr ). Both of these noises are related to the internal gain, so it is not accurate to calculate D* using the dark current I d for an optical detector with gain; it should be divided by the gain. Failure to divide by the gain is why reported values of low-dimensional-material photoconductivedetector performance are in general artificially high. Of course, the response time of high-gain photodetectors will be affected by gain to different degrees, depending on the source of the gain. For example, the gain of a high-gain avalanche photodetector comes from avalanche breakdown, so it has little effect on the response time. By contrast, gain caused by an internal trap state has a considerable impact on the response time.
The field of photodetectors based on 2D materials is booming and has vast potential. 84 However, accurate test methods are urgently needed to compare the performance indicators of various devices. The common practice of using dark current and laser tests to define device detection rates is inadequate, because it ignores key parameters, such as spot size, Gaussian distribution of optical power, and the gain relationship between dark current and noise in the gain device. These omissions may overestimate D* by orders of magnitude. Therefore, we suggest that the SNR and NEP of the device can be directly characterized by the combined measurement of the noise spectral density and irradiated photocurrent at a fixed modulation frequency (and 1 Hz bandwidth). Such a strict calculation process is essential for verifying high D* values and promoting the application of 2D photodetectors.

| CONCLUSION
2D material photodetectors provide solutions for nextgeneration machine vision problems. However, 2D materials are not yet feasible for practical use in intelligent photoelectric sensors. Technical issues that still need to be addressed include techniques for the stable growth of large-scale crystals, standardization of evaluation methods, and large-scale heterogeneous integration techniques. Existing intelligent sensing based on 2D materials has a relatively low scale, and system completeness is the main issue for intelligent sensing. A roadmap for nextgeneration MVSs based on 2D materials is shown in Figure 8. Theoretically, the three-dimensional integration of low-dimensional and other materials is possible; this would significantly improve system area efficiency, but achieving high-density heterogeneous integration will require advances in selective 2D-material etching. In addition, the connection density of the alignment circuit should be considered. In terms of algorithms, existing MVSs based on 2D materials mainly focus on the improvement of recognition accuracy. However, accuracy is difficult to compare between systems with different structures, and a new quality factor needs to be introduced to evaluate next-generation MVSs based on 2D materials. In terms of hardware, the trend has been toward reduction rather than elimination of the AD module and auxiliary modules. For example, in Mennel's research, 9 external memory is required to store the weight value, and a computer is required to calculate the loss and classification functions. In the future, cutting-edge machine learning algorithms and neural network structures will be integrated into image sensors. Therefore, the development of new MVS architecture requires cross-domain exchanges between researchers and the collaboration of multiple disciplines, including materials science, semiconductor physics, computer science, and data science.