Engineering Strategies for Advancing Optical Signal Outputs in Smartphone‐Enabled Point‐of‐Care Diagnostics

The use of smartphone‐based analysis systems has been increasing over the past few decades. Among the important reasons for its popularity are its ubiquity, increasing computing power, relatively low cost, and capability to acquire and process data simultaneously in a point‐of‐need fashion. Furthermore, smartphones are equipped with various sensors, especially a complementary metal–oxide–semiconductor (CMOS) sensor. The high sensitivity of the CMOS sensor allows smartphones to be used as a colorimeter, fluorimeter, and spectrometer, constituting the essential part of point‐of‐care testing contributing to E‐health and beyond. However, despite its myriads of merits, smartphone‐based diagnostic devices still face many challenges, including high susceptibility to illumination conditions, difficulty in adapter uniformization, low interphone repeatability, and et al. These problems may hinder smartphone‐enabled diagnosis from passing the FDA regulations of medical devices. This review discusses the design and application of current smartphone‐based diagnostic devices and highlights challenges associated with existent methods and perspectives on how to deal with those challenges from engineering aspects on constant color signal acquisition, including smartphone adapter design, color space transformation, machine learning classification, and color correction.

and spectrometry, has been utilized to detect enormous biomarkers. [12] The colorimetric and fluorescent biosensing can be realized in gas phase, [13] liquid phase, [14] or on lateral flow assay (LFA) strips, [15] cellulose test paper, [16] microfluidic chips [17] and so on. The color information of images captured by smartphones can be analyzed instantly using a predesigned smartphone application (APP) or software/program on the computer after data transfer. [18] However, the constancy of color information is difficult to achieve for smartphone-based optical detection considering its high susceptibility to illumination conditions, [18] difficulty in adapter uniformization, [19] and low interphone repeatability. [20] The constant and uniform illumination conditions can be maintained by attaching extra devices with independent light sources to smartphones. However, the adapters are difficult to be applied to all types of smartphones considering their different shapes and sizes, especially for those of fluorescent purposes. The requirement of having to be lightproof restricts the flexibility of adapters. [21] In addition, photos taken by smartphones of different models could show significantly different colors due to different postprocessing and inherent differences in the spectral sensitivity of the camera. [22] The raw signals captured by CMOS sensors using a color filter array (CFA) are processed nonlinearly in the process of color space transformation (CST) and lossy compression before storing in the memory, increasing the difficulty of color calibration. [23] Image autocorrections such as automatic exposure correction, white balance (WB), tune manipulation, brilliance adjustment, saturation adjustment, and contrast enhancement can highly affect the color values. [24] Moreover, the color of photos taken by the same smartphone varies as well with different shooting conditions, including shooting distance, imaging angle, camera settings, such as International Standardization Organization (representing sensitivity or brightness), exposure time, aperture, etc. One strategy to solve these problems is to generate a calibration curve for each type of smartphone camera with fixed settings and capture the assay signals with exactly the same settings. However, this procedure is apparently inconvenient. Some researchers circumvented accurate color acquisition by developing count-based [25] or distance-based [26] biosensors. Unfortunately, not all types of assays can be adapted to these forms. It is still necessary to achieve accurate color acquisition. These challenges associated with smartphone-based POCT have attracted researchers' wide attention.
There have been many reviews on smartphone-based POCT systems. González et al. [27] provided an overview of adapter design in different types of biosensors. Liu et al. [28] summarized the features of biosensors (paper-based sensor, flexible device, microfluidic chip, et al.) currently widely used in smartphonebased POCT. Nonno et al. [9] investigated different classes of smartphone-based devices (smartphone-based colorimeters, photo-and spectrometers, and fluorimeters). Sun et al. [29] summarized the development and application of mobile APPs for smartphone-based POCT. Kordasht et al. [30] summarized different types of smartphone-based immunosensors based on electrochemical, colorimetric, and optical techniques. Kap et al. [31] analyzed different smartphone-based colorimetric detection systems for glucose monitoring. Yang et al. [32] investigated different detection platforms based on colorimetric, luminescent, and magnetic assays. Biswas et al. [33] summarized recent developments in smartphone spectrometer sample analysis. Qin et al. [34] emphasized different signal analysis algorithms for lateral flow immunoassay detection. Although most of the reviews pointed out the current challenges of smartphone-based optical diagnostics, no recent reviews provided a comprehensive analysis and summary of existing strategies on solving the signal readout quality. This review emphasizes four engineering strategies on the constant color signal acquisition, including smartphone adapter design, CST, machine learning (ML), and color correction, as shown in Scheme 1.

Smartphone Adapter Design and Engineering
The color signal detected by the camera sensor is determined by the product of irradiance, reflectance of imaging target, and the spectral sensitivity of camera. [35] Although the adapters to smartphones for light shielding do not ensure the same spectral sensitivity of camera sensors, they do guarantee the constancy of irradiance and reflectance to a great extent with simple operation, highly improving the performance of smartphone-based optical biosensing.

Colorimetry
The conventional approach of colorimetric biosensing is to derive a calibration curve based on a single or the combination of multiple channels of a color model which leads to the highest correlation between the color intensities and analyte concentrations. Even though the calibration curve performs well in a controlled environment, the color intensity values will be biased under ambient light sources due to their high sensitivity to the illumination sources. [36] Especially, when taking photos from above, the smartphone or users can always block some light and cast shadows on the sensing platform. Some researchers took photos at a certain angle and allowed ambient light to come through from one side. [37] However, the light intensities at different places and different times will also be different, leading to inaccurate results. To solve this problem, some researchers use the ratio of color intensity in the test zone and control zone or background region, [38] but the interference may not be eliminated completely. Some other researchers constructed the standard curve together with the measurement of samples to ensure the same light conditions. [39] However, this will greatly increase the complexity of the detection procedure. Therefore, to make the measurement without constructing the standard curve, an adapter that can block the light from surroundings with extra light sources providing homogenous and constant light is often used, so a standard curve can be prepared in advance and stored in APP (Figure 1). Patients only need to take one photo and results can be directly given. Although an extra device induces inconvenience, using housing is the easiest way and ensures high constancy of color intensities. To achieve a homogenous light environment in the housing, an optical diffuser, such as translucent glass, can be covered on the light source (Figure 1a,b). [40] Or instead of irradiating samples directly, reflected light from the wall of the housing is also used. For water sample imaging, to avoid light reflectance from the water surface, the light source can be placed under the water sample, and the transmission of light is measured. [41] Or a polarizer can be placed in front of the smartphone camera to reduce reflections from the target surface. [42] The light sources can be light-emitting diodes (LEDs), lasers, smartphone flashlights, electroluminescent sheets, etc. They are usually powered by batteries or smartphones through universal serial bus (USB)-on the go (OTG) connection ( Figure 1b). [40a,43] Interestingly, Iqbal et al. [44] used phone's screen as the light source and front-view camera as the detector, so no extra light source is required. However, the emission spectrum of different smartphone screens is different, which may induce variation in illumination conditions. This is the same case as using the flashlight, which does not provide the same illumination for different smartphones and either does not ensure even lighting. [45] The size of the adapter is primarily limited by the long focus distance of smartphones, which can be reduced by an extra biconvex lens (Figure 1a,b). [46] Fu et al. [47] developed a very small cylinder-shaped adapter (Figure 1c). Instead of using the camera at the back of the smartphone, they used the ambient ii) The internal structure of the imaging stage with a light illumination pathway during a readout. Reproduced with permission. [40b] Copyright 2020, American Chemical Society. b,i) Schematic of a smartphone adapter with components including lens and a light diffuser; ii) The attachment with the smartphone; the sensor was connected to the smartphone through USB-OTG. Reproduced with permission. [40a] Copyright 2021, MDPI. c) Schematic of a smartphone adapter using the ambient light sensor. Reproduced with permission. [47] Copyright 2016, Royal Society of Chemistry. d) Illustration of i. a reflection film module and ii) a mirror surface module in the detection chamber. Reproduced with permission. [52] Copyright 2017, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com light sensor to measure the light signals, so no extra convex lens or diffuser was needed. Park et al. [48] also used the ambient light sensor, and it was claimed that it could exclude the influence of different illumination conditions. Overall, a generalized smartphone adapter for colorimetric biosensing includes a convex lens, LED modules, an optical diffuser, a color checker (will be mentioned in Section 5), and a holder for all of the components. The adapter can be fixable to the smartphone, so the position of the sensing platform relative to the smartphone camera is defined, making it easier to target the detection signal. However, this usually restricts the versatility of the adapter (i.e., the adapter has to be a certain shape and size to fit certain smartphones). Advanced adapters can be designed to be adjustable to fit different smartphone models and camera focal lengths.
There are other types of optical biosensors that are based on luminescence, including chemiluminescence, [49] thermochemiluminescence, [50] and electrochemiluminescence. [51] The housing design is similar to the aforementioned structures. Because the light signal is emitted from the sample itself, no extra light sources are needed. However, a completely lightproof housing becomes crucial for detection. Moreover, to gather more photons, a mirror or reflective materials can be coated in the adapter (Figure 1d). [50,52,53] Longer exposure time of the camera is often involved for luminescence detection. [54] The adapter can be designed to be flexible, but this increases the cost and makes it difficult to guarantee light tightness. Some studies used a simple light box that allowed users to manually put the smartphone on the box, although an additional automatic color-retrieving algorithm is usually required [55] (Figure 2), or the test region needs to be manually selected by users. [39] Some applications of smartphone-based colorimeters are listed in Table 1 and 2.

Fluorimetry
Fluorimeter measures the intensity of emitted light from a fluorophore with excitation by a certain wavelength of light. An extra light source and a lightproof housing are always required. The major difference from the colorimeter is that the excitation light source is placed perpendicular (Figure 3a) or nearly perpendicular (Figure 3b,c) to the emitted light. [56] If the sensing platform is 2D, such as a test paper, it will be difficult to set a 90°angle between emission light and excitation light. To solve this problem, a dichroic mirror, which allows light with certain wavelengths to pass through while reflecting light with other wavelengths, is usually utilized to separate emission light and excitation light (Figure 3d). [57] To reduce background signal and increase detection accuracy, a light filter can be embedded to block excitation light and only pass emission light to the CMOS sensor. [58] The excitation light sources are usually LEDs with certain wavelength or lasers, on the grounds that Figure 2. Illustration of automatic color-retrieving results and algorithms. a) The color chart localization. Reproduced with permission. [142] Copyright 2020, IEEE. b) Circle shape test region localization. Reproduced with permission. [145] Copyright 2017, IEEE. c) Automatic color-retrieving algorithm: i) Noise reduction and image simplification; ii) Edge detection; iii. Initial contour extraction; iv) Final contour labeling; v) Automatic position determination to obtain reference colors and target colors; vi) Retrieved reference colors and target colors. Reproduced with permission. [55a] Copyright 2017, MDPI. d,i) Automatic color-retrieving algorithm; ii) Original image; iii) ROI extraction. Reproduced with permission. [155] Copyright 2019, MDPI.  [174] www.advancedsciencenews.com www.advintellsyst.com the fluorescence occurs only at specific wavelengths of light. This limits the generality of smartphone-based fluorimeter housing, because the light sources and the light filters need to be changed for different biosensors. Lee et al. [59] somehow solved this problem by developing an adapter with a revolvable filter case including four bandpass filters and a laser sheath which allows the change of the light source ( Figure 3e). Some applications of smartphone-based fluorimeters are listed in Table 3.
Generally, a smartphone adapter for fluorescent biosensors should include the components of a convex lens, excitation light sources, excitation light filters, dichroic mirrors, emission light filters, and a lightproof holder. To achieve the detection of fluorescent signals with different emission wavelengths and excitation wavelengths, a filter set can be embedded in the adapter. However, this method is obviously nonconvenient and limited because it is impossible to cover the excitation and emission spectrum of all of the fluorophores. The spatial approaches, which rely on the dispersion of light using either a prism or diffraction grating as discussed in Section 2.3, can be considered for the future design of adapters for fluorescent biosensors. After separating lights according to their wavelength, lights with desired wavelengths can be selected according to their location on the light spectrum using adjustable slits and gathered by photomultipliers.

Spectrophotometry
A spectrophotometer is used to separate and measure the spectral components of a beam of light. [33] Compared with colorimeters and fluorimeters, which only give us information about superimposed light with a wide range of wavelengths or light with one certain wavelength, spectrometers give us a spectrum, therefore enabling surface plasmon resonance (SPR)-based or surface-enhanced Raman spectra (SERS)-based biosensing. SPR spectroscopy is a powerful label-free technique that enables noninvasive real-time monitoring of noncovalent molecular interactions. [60] By observing peak shifts of the spectrum, the concentration of analytes can be determined. Fan et al. [61] used smartphone-based SPR spectroscopy for the detection of CA125 and CA15-3 ( Figure 4a). The limit of detection reached 4.2 and 0.87 U mL À1 , respectively. Walter et al. [62] detected 25hydroxyvitamin D in human serum samples using a gold nanoparticle (AuNP) aptamer-based assay and achieved a sensitivity of 0.752 pixel nM À1 . The light was guided by optical fibers and integrated with microfluidic chips (Figure 4b). Except for SPR-based spectroscopy, another popular spectroscopy is based on SERS, which refers to enhanced Raman scattering with the assistance of nanomaterials. [63] The spectrum of Raman scattering of each molecule can serve as a fingerprint and therefore can be used for the detection of various analytes. [64] To obtain the light spectrum through smartphones, additional components are required. First, the light beam can be separated according to the wavelength through a prism or diffraction grating (Figure 4c,d). [65] Light with different wavelengths will be diffracted by different angles, resulting in the formation of a spectrum. The spectrum can be captured by smartphones, and the light intensity of each column can be calculated. After calibration, the wavelength can be determined by its location, Ultrasensitive, [178] www.advancedsciencenews.com www.advintellsyst.com  [182] www.advancedsciencenews.com www.advintellsyst.com and a spectrum can be generated. Second, a light source is necessary. Normally used light sources include LEDs, [61] smartphone flashlights, [62] or sunlight, [66] with the LEDs being the most commonly used due to their low cost, accessibility of a wide range of wavelengths, and application variability. [9] CloudMinds Inc. invented a portable smartphone-based SERS spectrophotometer ( Figure 4e). By applying the slit space coupling technique and volume-phase holographic transmission grating, the sensitivity is improved while the size of device is reduced. [67] The device can be integrated to the back of the smartphone and communicated through Smartport interface. However, instead of using the CMOS sensor of smartphones, a charge-coupled device is used for signal acquisition. The data will then be transferred to a cloud network through Wi-Fi or 3 G/4 G/5 G. The server will smooth the spectrum, and the baseline will be fit and subtracted to obtain a pure Raman spectrum. [68] Several research groups have verified this device. The details are listed in Table 4.
Because in this case smartphone only serves as a result receiver, the difference in smartphone camera spectral sensitivity is circumvented. However, if we use the smartphone camera for signal acquisition, color correction is necessary to ensure that different smartphones produce the same color value for the same color signal. The details will be discussed in Section 5.

Color Spaces
The color changes in the liquid, test strips, or hydrogels can often be used for quantitative detection of various analytes through different colorogenic reactions. These colors could be abstracted numerically as a mathematical formula so that they can be stored and interpreted on devices/computers. The value of its Reproduced with permission. [58] Copyright 2018, Royal Society of Chemistry. b) Smartphone-based particle diffusometry platform. i) Optics setup within the platform includes an external ball lens, filter, and laser at a 15°incident angle; ii) Schematic of the adapter; iii. Integration of adapter to smartphone. Reproduced with permission. [156] Copyright 2020, Elsevier. c) Schematic diagram of a smartphone-based fluorescence spectrum reader. Reproduced with permission. [157] Copyright 2018, Elsevier. d) Working principle of a dichroic mirror. Reproduced with permission. [158] Copyright 2019, Elsevier. e,i) Schematic diagram of a smartphone-based fluorimeter with a selectable filter; ii) A demonstration of fluorescent imaging with a real object. Reproduced with permission. [59] Copyright 2017, Elsevier.
Designs and applications of smartphone-based fluorimeters.  [185] www.advancedsciencenews.com www.advintellsyst.com Figure 4. a) Schematic of attachments of the smartphone biosensor system with multitesting units based on localized SPR integrated with microfluidic chips. i) The working principle of the system; ii) The schematic of attachments; iii) The detail of the case; iv) The detail (top view) of the stage of microhole array and small-lens array; v) The picture of attachments and the smartphone. Reproduced with permission. [61] Copyright 2020, MDPI. b) Conceptual design of the all-optical planar polymer-based biochip sensor platform for smartphones. Reproduced with permission. [62] Copyright 2020, MDPI. c) Light diffraction through a prism. d) Light diffraction through diffraction grating. e) Schematic illustration of the SERS measurement using the smart SERS terminal, adapter, and SERS chips. Reproduced with permission. [159] Copyright 2019, Royal Society of Chemistry. www.advancedsciencenews.com www.advintellsyst.com mathematical representation varies with different color models, including RGB, HSV, HSL, CIE XYZ, CIE L * a * b* (or LAB), etc. The most common color model is RGB. It is an additive color model in which the red, green, and blue monochromatic (single wavelength) primary colors of light are added together in various combinations to reproduce multiple colors. [69] Different devices may use different primary colors and yield different RGB color spaces, meaning that the same set of tristimulus values may refer to different colors ( Figure 5a). Thus, RGB is specific to one imaging device and is termed as device dependent. RGB is coded on 256 levels, from 0 to 255. The value represents the light intensity of its corresponding primary beam. For instance, zero value of R, G, and B indicates no existence of light and represents black color. The maximum value of R, G, and B means superimposition of fully-on light beams, representing white color. Different colors can be represented by different combinations of R, G, B values. However, it is not easy to intuitively tell the tristimulus values of a specific color. Therefore, HSV and HSL were created to accommodate easier color adjustment ( Figure 5b). HSV is a cylinder color model that remaps colors into three interdependent dimensions that are more understandable to humans, which are hue, saturation, and value. Hue describes the dominant color family, saturation represents purity, and value controls the brightness. HSL is another cylinder color model, whose H and S dimensions are the same as those of HSV. L represents lightness, indicating the color's luminosity. It is different from the V dimension of HSV in that the purest color is halfway the black and white ends of the scale (Figure 5b). HSV and HSL are both derived from RGB space and therefore are device dependent.
One issue with RGB color model is that not all the colors that humans can perceive can be represented in RGB without introducing negative values. Therefore, in 1931, International Commission on Illumination (CIE) created the CIE 1931 XYZ color space, which uses imaginary colors as primary colors, so as to avoid negative values for all visible colors. [70] CIE XYZ is also a standard representation for colors. By mapping the color from a device's own color space to CIE XYZ space, color can be measured or translated to another device-dependent color space.
In the XYZ color model, Y corresponds to relative luminance, Z is approximately equal to B dimension of CIE RGB, and X is a blend of all three. [71] Note that any color space with an explicit relationship to XYZ is said to be device independent. For example, once the three primary colors of an RGB space have determined values in XYZ color space, this RGB space becomes device independent. One limit of XYZ color space is perceptual Figure 5. a) Illustration of different RGB color spaces in the chromaticity diagram. The three vertices of each triangle indicate the three primary colors of the corresponding color space. The area inside the triangle is the gamut of that color space. b) HSV and HSL color spaces. c) Comparison between linear RGB and LAB color space. d) Illustration of LAB color space. Reproduced with permission. [160] Copyright 2009, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com nonuniformity, which means that a Euclid distance on the CIE xy chromaticity diagram is not proportional to the perceived difference between two colors (Figure 5c). [72] Several attempts have been made to derive perceptually uniform color spaces from the CIE XYZ color space. In 1976, the CIE suggested using two quasiuniform color spaces, CIE L * a * b* (or LAB) ( Figure 5d) and CIE L * u * v* (or LUV). [73] Both of the systems use L* as lightness value, which correspond to the cube-root of the luminance. The other two axes, a* and b* for CIELAB or u* and v* for CIELUV, define colorfulness. LAB and LUV are based on four psychological primary colors, which are red, green, yellow, and blue. Red and green compose a pair of complementary colors, and yellow and blue form another pair. The mix of colors could happen across pairs but not within a pair. [74] For example, orange is a mix of red and yellow, but the clash of red and green produces a nearly black color. The value of a*(or u*) and b* (or v*) represents color changes from red to green and yellow to blue, respectively.

Applications of Different Color Spaces in Biosensing
Color quantification methods with smartphones using different color spaces have been extensively reported. Each color space performs differently in different applications, and some color spaces have been reported to be able to reduce color variation caused by illumination conditions or produce higher accuracy. Yang et al. [75] used one of the LAB color spaces, CIEDE2000, for colorimetric urinalysis. With fixed illumination conditions, the accuracy was comparable to a urine analyzer. They claimed it was necessary to convert RGB color space to a uniform color space because the distance between two colors in RGB cannot represent the human sense of color. However, no comparison between RGB and CIEDE2000 was reported in this research. Although they listed the results of other research using RGB, CIE XYZ, and HSV, showing CIEDE2000 outperformed other color spaces, these experiments were not conducted under the same conditions and cannot be used for comparison. Ba ş et al. [76] compared the performance of RGB and LAB on the quantitative measurement of glucose. Under constant illumination conditions, both RGB and LAB gave calibration curves with high linearity. In contrast, under ambient light which might cause gray background, LAB can still produce high linearity (R 2 ¼ 0.996), while RGB gave relatively poor linearity (R 2 ¼ 0.925). However, why LAB is more resistant to illumination variations was not mentioned. Moreover, whether the variation of illumination conditions causes a curve shift in color intensity was not reported as well. Kılıc et al. [39] compared two color-matching algorithms for semiquantitative water quality detection based on a single reference image. The first method is based on color correlation, which calculates correlation coefficients between test images and trained images. The second method is based on the Euclidean distance between the test image and the reference image in LAB color space. It was shown that the second method outperformed the first method. Similarly, Komatsu et al. [77] also used the Euclidean distance in LAB color space for the measurement of pH values. The pH indicator showed multiple color changes. It was claimed that the detectable pH range of this method was wider than the typical Figure 6. Properties of different color spaces. a,i) RGB did not show a clear trend over pH from 2-9; ii) ΔE demonstrated an obvious increasing trend with low relative standard deviation for each pH value. Reproduced with permission. [77] Copyright 2016, Royal Society of Chemistry. b,i) Monotonal color signals were converted into bitonal color signals after tinting; ii) Hue value difference was created after tinting. Reproduced with permission. [81] Copyright 2017, Royal Society of Chemistry.
www.advancedsciencenews.com www.advintellsyst.com grayscale-based image analysis (Figure 6a). Alvarez et al. [78] used different mathematical combinations of RGB parameters, including grayscale intensity, effective absorbance of R, and Euclidean distance between signal color and background color in RGB color space, for the measurement of tropospheric ozone. Effective absorbance of R showed the best performance. Nguyen et al. [79] demonstrated that using hue value can eliminate the influence of light source intensities given that the assay was based on bitonal or multitonal color change. Cantrell et al. [80] also verified that hue value provided better results than RGB for bitonal optical sensors. However, most of assays depend on the concentration of one specific color indicator; therefore, they are monotonal. To solve this problem, Krauss et al. [81] developed a tinting method, for example, changing white background to blue, and yellow color signal to green, so as to induce a larger change in hue value (Figure 6b). Kestwal et al. [82] used the saturation parameter of HSV color space for beta-glucan paper-based detection. The signal was normalized by subtracting the background signal, which was prepared from a blank or control paper device. High linearity with R 2 equal to 0.9953 was achieved. However, no comparison with other parameters or color spaces was reported.
The study of Yang et al. [83] found that S of HSV and S of HIS were not affected by smartphones for the prediction of soil organic matter content. Shalaby et al. [84] examined the performance of four nonuniform color spaces (RGB, CMY, XYZ, and Yxy), six uniform color spaces (LAB, LCH, Hunter-LAB, LUV, HSL, HSV), ΔE LAB , and ΔE LUV on the prediction of Cr(VI) concentration, showing that the use of Yxy slightly improved the linearity of the calibration line and all uniform color spaces gave at least one signaling parameter exhibiting extraordinary sensitivity and linearity. However, the conclusions may only be applicable to Cr(VI). An investigation of multiple dyes is desired. Nelis et al. [85] used liquids, pH strips, and LFAs for the investigation of color space channels' (RGB, HSV, LAB, and ΔRGB) efficiency. The color changes were introduced by dropped NPs often used for LFAs (i.e., gold, latex, or carbon black NPs) and oxidized TMB solutions in ELISA wells. The performance of each channel was compared under varying background illumination conditions. Moreover, six smartphones were used for imaging and their calibration curves were compared. Contrary to the conclusions drawn from most of other researchers, Nelis et al. [85] showed that L and V performed well in most systems but never outperformed the best RGB channel for a specific test. The H channel performed satisfactorily for color change, but poorly for color intensity quantization, and never surpassed the best RGB channel in a particular test either. R worked best for color changes, while B and G functioned best for color intensity changes. The ΔRGB values showed some robustness to errors in individual channels of RGB. They also demonstrated that the calibration curve derived from one smartphone could not be directly applied to another smartphone. Overall, opinions regarding which color space performed best vary widely with assays. Which color space is favorable might depend on the overlap between the absorbance spectrum of color signal substances and the spectrum of color channel, meaning that each color channel might need to be tested for each assay. Generally, uniform color spaces, such as LAB and LUV, may outperform nonuniform color spaces for measuring small changes in color considering that uniform color spaces resemble human perception and human eyes are more sensitive to relatively low intensity of light and smaller changes in color compared with camera sensors. New color spaces or algorithms can be invented to increase the linearity of calibration curve and to be more resistant to illuminations.

Machine Learning Classification
ML is a subset of artificial intelligence that studies computer models and algorithms (e.g., neural networks) in a data-driven way. Then the trained model with learnt pattern is applied to infer the rest of data. [86] It has emerged as a powerful tool for classification problems due to its flexibility and adaptability to dynamic conditions based on the features extracted from colorimetric information. [87] It can deal with changes in illumination and smartphone models in one stage. Traditional ML typically requires three components to learn a pattern: a dataset, features engineering, and an classification algorithm (Figure 7). The larger the size of data and features provided, the higher the accuracy.

ML Classification-Based Colorimetric Detections
Solmaz et al. [88] used support vector machine (SVM) and least squares-support vector machine (LS-SVM) classifier algorithms to classify distinct pH values. SVM is a supervised learning model which clusters training data to distinctive groups by mapping training examples to high-dimensional feature space so as to maximize the gap between the two categories. LS-SVM is the least-squares version of SVM which solves a set of linear equations instead of a convex quadratic programming problem for classical SVMs. [89] In this research, the influence of image format and illumination conditions is investigated. It was hypothesized that JPEG images were highly processed and compressed; therefore, the relationship with incoming light intensity is nonlinear, so the final color value cannot be fully trusted. RAW images derived from raw sensor data from a digital camera and RAW-corrected (RAWc) images after WB and color transformation to CIE 1931 XYZ color space of RAW images were compared with JPEG images. To mimic the diverse illumination conditions in real life, three different homogeneous light sources were used: sunlight (S), fluorescent (F), and halogen (H). To include more versatile conditions, the strips with the same pH level were imaged as a group of four with six different orientations and alignments including the pictures with variable rotations and   [87] developed an ML-based smartphone APP called "Hi-perox Sens" capable of image capture and analysis of color signals on microfluidic paper-based analytical devices (μPADs) for nonenzymatic colorimetric determination of H 2 O 2 through iodide-mediated TMB-H 2 O 2 reaction system. The Hi-perox Sens uses a Firebase cloud system for both transferring the image to the remote server and receiving the classification result back to the APP. The μPADs were prepared by adding only two indicators, TMB and KI, or KI only, to the test zones. The images were captured with four different smartphone models (Oppo A5 2020, Reeder P10, iPhone 5SE, and iPhone 6S) under seven different illumination conditions (H, F, S, HF, HS, FS, and halogen-fluorescent-sunlight). A total of 33 features, including 9 color channels (R, G, B, H, S, V, L*, a*, b*), mean, skewness, and kurtosis values for each color channel, texture features (contrast, correlation, homogeneity, and energy), entropy, and intensity values were extracted from the test zone on μPADs and fed into different ML classifiers. 23 ML classifiers were trained, and their performances were compared in terms of classification accuracy. Among these classifiers, linear discriminant analysis and ensemble bagging classifier outperformed others for KI and TMB þ KI, respectively. Classification accuracy of 97.8% was reached for TMB þ KI in the 0-5 mM concentration range, and classification accuracy of 92.3% was reached for KI in the 0.2-50 mM concentration range with good interphone repeatability at t ¼ 30 s.
Instead of putting the information of all of the color channels into a classifier, Khanal et al. [24] evaluated the performance of RGB, HSV, and LAB separately with four ML classifiers, including logistic regression (LR), SVM, random forest, and artificial neural network (ANN). The images of food color and PADs for pesticide assays captured under different illumination conditions with four different smartphone models (Huawei SCC-U21, iPhone 6, Honor 8C, and Samsung Galaxy J7 Max) and users were used for training and testing. Additionally, a reference zone was created aside from the test zone. The reference zone has the same components as the test zone, while a blank solution without pesticide was added to the reference zone as running the assay. Images of both reference zone and test zone were fed into classifiers in contrast to most of the other studies that imaged only the test zone. According to the results, this one-point calibration method provided better accuracy, indicating that reference color may partially eliminate the influence of different imaging conditions. ANN model and LAB color space produced the best concentration prediction accuracy (0.966) for food color assay, and SVM model and LAB color space produced the highest accuracy (0.908) for enzyme inhibition assay. However, this does not necessarily mean LAB color space works better than RGB and HSV. For example, RGB outperformed other color spaces in terms of the crossvalidation accuracy for food color assay using LR, SVM, and ANN models. Kim et al. [90] performed a similar study, where three different classifiers (linear discriminant analysis [LDA], SVM, and ANN) and four color spaces (RGB, HSV, YUV, and Lab) were evaluated. No explicit trends among color spaces and classifiers were found. Therefore, the selection of ML models and color spaces should be based on the applications and experimental data. More details and examples of ML-based colorimetric detection are listed in Table 5.
ML solves problems in a human-like manner. Utilizing the ML models to study from the data is a promising method to factor out the influence of ambient light and interphone variation. However, to achieve high accuracy and high resolution, a very large training dataset, usually over thousands of images, is required. Moreover, ML based on classification only gives semiquantitative results. With coarse-level classifications, the accuracy could reach 100%. However, when performing finegrained concentration classifications, the performance may not be satisfying. To improve the resolution of assays, the training data set needs to be enlarged as well. For example, to obtain results with a fractional value, one needs to have a training set with labels specified to decimals. [88] Nevertheless, this would require extensive labor work to take all possible values into consideration on a continuous scale. How to produce enough data in a small amount of time is a problem that needs to be solved urgently. Luo et al. [91] developed a high-throughput colorimetric sensor for total organic carbon analysis of environmental water samples. Five different types of ink were overprinted on a carrier to form thousands of interaction points, with each point representing a microreaction and producing independent data. Therefore, thousands of tests could occur simultaneously, and the results can be captured in one image. However, this method may only apply to assays based on simple chemical reactions.
Another challenge for ML classification is the training dataset, which has to cover all of the possible conditions. If a domain shift happens in test data, the prediction results will be much unreliable. For instance, it has been tested that although three different light sources have been used for training, as the illumination condition was changed to the combination of these three light sources, the accuracy was decreased from 100% to 80%. [89] If users captured the assay signal under different light sources and different light intensities, the accuracy might decrease further. Yang et al. [83] also demonstrated that different smartphones did not fit in the model constructed from other smartphones very well. Training images captured with more diverse conditions may ensure the accuracy, but it also means including a very large amount of training set to the classifier, which needs massive resources to function. In the past few years, convolutional neural networks (CNNs) have been very successful in computer vision. However, CNNs have limited advantages over other ML models because colorimetry on paper-based devices does not provide texture and shape variation. Developing a PAD that expresses changes in the shape and texture according to the analyte concentrations can allow us to take full capacity of CNNs and build more accurate methods for estimating analyte levels.

Accuracy Improvement Strategies
Increasing the accuracy of models is always the driving force for the development of ML algorithms. Many approaches are proven effective in improving accuracy, all of which can be divided into two main categories, improving the dataset and the model.
The size and the quality of the dataset are the key factors that affect classification accuracy. Experiments on multiple models including CNN and ResNet prove that the accuracy of the model increases with the size of the dataset. [92] The most straightforward method for enlarging datasets is to generate more labeled data, but this method may not be applicable to all tasks. In the case of biosensing and biomedical imaging, data acquisition requires consent from patients and labeling by healthcare professionals, which limits the size of the dataset. In this case, data augmentation methods are used. For optical sensors, the input of the ML algorithm is usually an image, and there are many augmentation strategies for images. The simple augmentation techniques include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, etc. [93] Recently, generative adversarial networks (GANs) have been used in data augmentation, and they can augment sources of variance which are not produced in traditional augmentation methods. [94] Using the augmentation strategies, the size of the dataset is enlarged, and the accuracy of the ML algorithm is improved.
The quality of the dataset is determined by many aspects. The three most important factors influencing the quality of a dataset are comprehensiveness, correctness, and variety. [95] The comprehensiveness of data describes how well the dataset includes all representative samples. A lack of comprehensiveness may lead to a biased model. Missing value treatment is the process of generating samples that fill up the gap in the sample population, and it can increase the comprehensiveness of the dataset. [96] The correctness of data corresponds to the validity of data and the accuracy of labels. The ILSVRC12 ImageNet dataset is the most frequently used one in classification tasks. Russakovisky et al. pointed out that there are mislabeled images in their dataset. [97] Automatic error detection and cleanup algorithms can correct the mislabeled data and improve the accuracy in ImageNet dataset classification by 2-2.4%. [98] The variety of the dataset is the measurement of the convergence of the distribution of data samples to the overall population. Improving the variety of the dataset is mostly relied on the sampling procedure, in which the LG G6 Semiquantitative [194] www.advancedsciencenews.com www.advintellsyst.com percentage of a class of samples should be approximately the actual percentage of that class. By increasing the quality of the dataset, the accuracy of the algorithm can be improved. [99] There are many approaches to improving the accuracy of a neural network by working on the models. The two major approaches in optimizing the model are the selection of backbone networks and the tuning of hyperparameters. The rapid development of neural networks provides researchers with many choices of models, from the traditional convolution neural networks (CNNs) to the popular transformer models. Selecting the correct model for the task will increase the accuracy by a great margin. [100] There is no gold standard for choosing between models, only by experiments can researchers pick out the optimal model. Tuning of the hyperparameters is the most important procedure to ensure a highly accurate model. Hyperparameters refer to the parameters that cannot be trained during the training process, most of which are related to the model's architecture and the training settings. [101] The most typical hyperparameter in model design is the number of hidden layers and the number of neurons in each layer. The model's ability to analyze data increases with the size of the model. If the size of the model is too small, the model may not be able to utilize all the features of the sample, causing underfitting problems. [102] On the other hand, if the size of the model is too big, it may learn features from the noise of the training data, causing a decrease in the model's generalization ability and resulting in overfitting. [103] The hyperparameters regarding the training procedures also influence the accuracy of the model. Three of the most researched training settings are learning rates, optimizers, and batch sizes. Learning rates decide the length of each step in the backpropagation process. Large learning rates can converge to a global minimum faster, while small learning rates may locate a more optimal minimum. [104] The goal is to find a learning rate that can quickly locate an optimal minimum. Several methods can be used to find an adequate learning rate. Learning rate decay methods like stepwise LR decay [105] and exponential LR decay [106] decrease the learning rate in the training process; this could make the training both efficient and accurate. Optimizers are the optimization algorithms that find the minimum in the loss function; different optimizers will give different training speeds and accuracy. The most adopted optimizers include stochastic gradient descent (SGD), [107] root mean square prop (RMSprop), [108] and Adaptive momentum estimation (Adam). [109] The choice of the optimizer is a complicated matter, which depends on the quality of the datasets and the structure of the model. In most cases, Adam should be used as the default optimizer as it performs consistently in a variety of tasks. [110] Batch size is the number of training samples that is used to calculate the gradient in each iteration. Theoretically, a larger batch size can give a better estimation of the gradient. But larger batch size will have a larger memory cost, and it can even lead to loss of generalization ability. [111] Studies also show that batch size and learning rate should be picked in pairs; a large batch size performs best with a large learning rate. [112] In summary, model selection and hyperparameter tuning have a great impact on the accuracy of the neural network.
In conclusion, approaches to improving the accuracy of ML algorithms can be divided into two main categories, improvement to the dataset and to the model. The dataset should have a high level of comprehensiveness, correctness, and variety, and the model should be designed with a suitable structure and be trained with the appropriate procedure. [95]

Camera Image Signal Processing
Color correction aims to eliminate the influence of illumination and achieve color constancy for different smartphones. To understand the principle of color correction, it is important to have a general knowledge of how smartphone cameras capture images and how they are processed. An image signal processing (ISP) pipeline can be divided into two stages (Figure 8). The first stage involves the conversion of raw RGB data from camera-specific color space to device-independent color space, including preprocessing, WB, CST, etc. The second stage includes gamma encoding, postprocessing, etc. The purpose of the second stage is mainly for image compression and color manipulation.
The color signal detected by camera sensor is determined by the product of irradiance, reflectance, and the spectral sensitivity of camera. Following the matrix notation of Karaimer et al., [113] let l represent the illumination as a 1 Â N vector, where N is www.advancedsciencenews.com www.advintellsyst.com the number of spectral samples in the camera's sensing range (e.g., from 400 to 700 nm). Each vector coordinate is the intensity of light with the corresponding wavelength. We use R to represent the reflectance of imaging targets as an N Â M matrix, where M is the number of imaging materials. For easier understanding, we specify the imaging target as a color rendition chart. Then M is the number of color patches on the color chart. We use C cam to represent the spectral sensitivity of the camera as a 3 Â N matrix, with each row corresponding to one color channel, (i.e., R, G, and B). Using this notation, the camera's response in illumination l can be represented as where Φ l is a 3 Â M matrix. Each column is the RGB value of a color patch. For imaging the same material, the camera's response is therefore dependent on the illumination and camera spectral sensitivity. Efforts have been made to achieve color constancy in the factory. A colorimetric mapping is performed to convert sensor RGB values from their camera-specific color space to a perceptual color space. The target perceptual color space can be expressed as where C xyz is the CIE 1931 XYZ matching function. The illumination matrix L is ignored in this equation because it is assumed that the image is captured under ideal white light (i.e., all entries of l are equal to 1). To derive the mapping function from Φ l to Φ xyz , we need to know the exact value of Φ xyz . Therefore, a color checker with established reflectance is usually imaged for calibration. The colorimetric mapping can be divided into two steps: WB and CST. The whole process of colorimetric mapping is summarized in Figure 9. Note that the CST in this section is different from the concept in Section 3.

White Balance
Human visual system is able to maintain the color appearance of objects under a wide variation of light conditions through a process called chromatic adaptation. [114] The small changes in color can be discounted after being accommodated to the illumination. For example, a white paper would still be perceived as white at dusk, even though the actual color may be yellowish. In contrast, camera sensors do not have the ability of chromatic adaptation and only record the actual light, which nevertheless may not appear natural to humans. Therefore, a WB, which is a linear transform, is performed to remove the illumination's color cast. [115] One simple method for WB is using RGB equalization, or wrong von Kries model, with reference to a white standard and a black standard. [35] The gray patches on the color chart equally reflect light with different wavelengths. The R, G, and B values Figure 9. This figure is modified after Karaimer et al., [113] which illustrates the colorimetric mapping from the camera-specific color space to the standard CIE XYZ color space.
www.advancedsciencenews.com www.advintellsyst.com recorded by the camera are expected to be equal for such surfaces. However, this is never the case due to the properties of illumination and spectral sensitivity of the camera sensor. RGB equalization counteracts this error. However, although it is easy to implement, it does not ensure the natural look of images. More complex models, such as chromatic adaptation transform (CAT) (or white point conversion), provide better human perception. Commonly used CAT models include Von Kries, [116] Bradford, [117] Sharp, [118] and CMCCAT2000. [119] Another method for WB is to calculate a diagonal 3 Â 3 matrix, W d , that minimizes the ideal camera response in white light for a white object and the actual camera response. The subscript d means W d is a diagonal matrix. Because the neutral object reflects the light spectrum equally, the reflectance is equal to the illumination, and therefore all entries of R are equal to 1. Once the illumination has been estimated, W d can be computed directly from the estimated illumination parameters as There are many ways to estimate illumination, including gray world, [120] white patch-based method, [121] bootstrapping method, [122] statistical methods, [123] gamut-based methods, [124] and ML methods. [125] The fundamental idea is to put a white calibration object in the imaged scene. The chromatic cast on the white object is the color of the illumination in the camera's color space. Interestingly, Abdelhamed et al. [126] leveraged the difference in the response of the two rear-facing cameras in the smartphone to estimate the illumination by training a small neural network.
The obvious drawback of WB is that it cannot guarantee that the nonneutral scene materials are appropriately corrected. [113] Thus, a CST is applied to convert the raw tristimulus values captured by the camera from the camera-specific color space to a perceptual device-independent color space (e.g., CIE XYZ) for subsequent processing.

Color Space Transform
To generate chromatic images, the camera splits the light into three channels by putting a CFA (usually a Bayer mosaic filter) in front of the photodiode to record the tristimulus values. The filter properties vary with camera models, so different cameras record different tristimulus values for the same scene even in the same illumination. CST corrects and ensures the reproducibility of image color by converting the tristimulus values from a devicedependent color space to a standard, device-independent color space. The 3 Â 3 CST matrix, T l , can be computed by minimizing the following.
where Φ l cam is the camera response under illumination l, and W l d is the WB matrix for illumination l. The calculated parameters are stored in the firmware of the camera, so customers do not need to re-calculate the CST using a color chart every time before taking a photo. However, as we can see, the mapping function is derived from one specific illumination, which means the mapping function can only accurately correct the images that are captured under exactly the same illumination. In the ideal situation, one mapping function needs to be calculated per illumination. However, this is impossible to fulfill in the real situation. To address this problem, two CSTs for two fixed illuminations that are selected far apart in terms of correlated color temperature (CCT) are generally precomputed in the factory. The actual CST applied to the image is interpolated from the two precomputed CSTs according to the estimated illumination. [127] However, because the interpolation process is based on estimation, the color reproduction accuracy is greatly affected.

Gamma Correction
Humans perceive light differently from the camera. Most of the camera sensors record light linearly, which means when twice the number of photons hit the sensor, it outputs twice the signal. However, this is not the case for human eyes. Humans do not perceive light twice as the brightness as the light which is onehalf of the luminance. Instead, they are perceived to be much closer ( Figure 10a). Therefore, if we directly store the signal that the camera received, many storage bits would be used to describe the brighter tones that humans perceive with less sensitivity, and fewer bits would be devoted to the darker tones that are more valuable for human perception. To address this problem, a nonlinear gamma encoding is applied to expand the bits used for describing darker tones and compress the bits used for describing lighter tones (Figure 10b). The encoding gamma is defined as where γ < 1, V in is the input signal after CST, A is a constant and generally equal to 1. Because gamma encoding is performed before the captured image is converted to a JPEG file, a set of linear colorimetric signals would not appear linear anymore in the image. This is one of the reasons why colorimetric signals recorded by smartphones Figure 10. a) Difference between camera response and human eye perception. b) Comparison between linearly encoded color signal and gamma-encoded color signal. More bits are used for storing weaker light signals for which humans are more sensitive after gamma encoding.
www.advancedsciencenews.com www.advintellsyst.com usually deviate from Beer's law. One confusing point is that the screen still needs to display the actual luminance same as the luminance recorded by the camera sensor so as to reproduce the scene for natural human perception. Therefore, after gamma encoding, the monitor needs to perform gamma decoding before displaying the image. Decoding gamma has the same equation as encoding gamma, but with γ > 1. The decoding gamma of the display needs to match the encoding gamma of the camera (i.e., the product of encoding gamma and decoding gamma is equal to 1), so as to generate the correct color signal. One thing to notice is that the function of gamma is to optimize storage efficiency, instead of color correction. Linear RAW image data would still appear natural for human eyes but only on a linear gamma display. The following diagram illustrates the overall process of gamma correction ( Figure 11). Because color measurement occurs before gamma decoding, we need to perform gamma decoding manually to recover the linearity of signal. However, mismatched encoding gamma and decoding gamma would fail in achieving this goal. Unfortunately, the applied gamma correction value varies depending on ambient light and other automated settings, and its value is not easy to get access as the applied gamma curves are usually proprietary.

Postprocessing
For aesthetic purposes, the images usually go through a series of modifications, such as brightness modulation, brilliance enhancement, saturation manipulation, contrast adjustment, noise reduction, etc. Those modifications are mostly nonlinear, and modification methods and specific parameters differ significantly with manufacturers. Therefore, it is difficult to reverse this process without a re-calibration step (e.g., imaging a color chart).

Color Correction
Many methods have been proposed to enhance color reproduction accuracy by improving the first stage or reversing the second stage of ISP.

Color Correction Targeting on the First Stage
Due to the complexity of postprocessing, it would be beneficial to stop the ISP before it goes through postprocessing and read the intermediary pixel values. However, this is complicated by the lack of access to the camera hardware, because the first stage is applied onboard the smartphone. Some studies used the screenshot captured in the preview or video mode to circumvent device-specific postprocessing of images. [128] Karaimer and Brown [129] addressed this restriction further by introducing a software-based camera emulator that is compatible with a wide variety of cameras. This software platform is able to stop the ISP pipeline at intermediate steps and allows the acquisition and modification of intermediary pixel values. For different imaging devices, the color value of images right after WB and CST should show higher constancy than those after postprocessing. Figure 11. The overall process of gamma correction. Linear raw color signal is stored nonlinearly in a JPEG file. The display performs gamma decoding. Because the net effect of gamma encoding and gamma decoding is linear, the monitor emits the same light as the original scene. Therefore, the picture displayed on the monitor appears naturally to humans. However, because the measurement step occurs before gamma decoding by computer monitor, gamma correction needs to be performed manually to restore the linearity of color signal.
www.advancedsciencenews.com www.advintellsyst.com However, as mentioned above, the potency of existent WB and CST is not enough to eliminate the interphone image color variations. Therefore, several methods have been raised to improve WB and CST. Karaimer et al. [113] proposed a method of incorporating an additional calibrated illumination with a CCT at %5000°K into the interpolation process and a second method of using a full-color balance matrix and a fixed CST. A Bayesian classifier is trained to estimate the full-color balance matrix for camera images captured under arbitrary lighting conditions. The performance of mobile phone cameras was improved up to 33% and 59% for the first method and the second method, respectively. The first method runs fast and can be easily incorporated into the existing camera pipeline, while the second method involves ML, which means it is not suitable for use onboard a camera but can be helpful to subsequent image analysis on the computer. Finlayson et al. [118] applied a spectral sharpening transform in the form of a 3 Â 3 full matrix to enhance the performance of WB. To estimate the sharpening matrix, it was necessary to capture materials with known reflectance spectra under different lighting conditions. Chong et al. [130] refined this method by directly solving the sharpening matrix using the spectral sensitivities of the camera. Unfortunately, camera manufacturers generally do not provide detailed information on spectral sensitivities, so the burden of obtaining them is imposed on the users. Notable methods include direct approaches and indirect approaches. Direct approaches include recording the camera response and spectroradiometer response to light at each wavelength. Thus, a monochromator is generally used to generate monochromatic light ( Figure 12). [131] Although this method ensures high accuracy, it is time-consuming and costly. Soda et al. [132] extended this method by replacing the monochromator with a movie that swept the wavelength of the light emitted from the screen in a predefined range and rate. Indirect approaches refer to deriving the spectral sensitivity by imaging a color chart under known illumination. This method faces the problem of solving high-dimensional matrices, and the matrices are seriously rank deficient even with a large number of color patches. Tominaga et al. [133] improved the estimation efficiency and accuracy using a small number of color patches by referring to the smartphone camera spectral sensitivity database and extracting the features of spectral function shapes. Jiang et al. [134] applied principal component analysis to the database and proposed two methods to estimate the spectral sensitivities for both known and unknown illumination. Those approaches that combine direct methods and indirect methods are described as being more accurate than conventional indirect methods. [135] If we can image a color chart under the same scene of the imaging target, and have access to the raw data, the color correction would be easy by deriving a mapping from sensor RGB directly to CIE XYZ color space without the need for illumination correction. Notably, Funt and Bastani [136] solved the problem of color calibration under nonuniform illumination across the color chart using a numerical optimizer. They later proposed a faster and easier technique based on least-squares regression on the unit sphere. [137] Hong et al. [138] tested different polynomial transfer matrices for converting device-dependent color space to CIE XYZ color space using least-squares fitting technique. It was described that the polynomial model with higher order yielded better results than 3 Â 3 linear transforms. However, polynomial color correction is more susceptible to camera exposure compared with linear color correction, as the exposure alters the vector of polynomial matrix in a nonlinear way, which leads to hue and saturation shifts. [139] Finlayson et al. [139] later proposed a fractional polynomial to address this problem by taking the k-th root of each k-degree polynomial term. Bianco et al. [140] pointed out that color correction transform would amplify the illuminant estimation errors. They demonstrated that it was possible to mitigate the error amplification by considering the probability distribution of the illumination estimation algorithm.

Color Correction Targeting on the Second Stage
Methods regarding the reversal of the second stage to undo nonlinear processing have been extensively reported. Like the derivation of CST, we can use a color chart to restandardize the image color. Before the full color of the image is corrected using all colors in the color chart, a gamma correction is usually applied using the grayscale patches of the imaged color checker.
Takahashi et al. [141] created a color chart to increase the color accuracy of telemedicine, where the patient's skin or tongue color was examined. They applied gamma correction and then color correction for the images captured by patients with a color chart placed aside using the algorithm of multiple regression. They also considered that the display's performance varied depending on the model, so the same color could not be output even if the same RGB was input. Therefore, a color correction for doctors' displays was also performed using the color chart and a colorimeter to ensure the display color reproducibility. You et al. [142] proposed a low-cost, contactless chicken meat quality evaluation method by examining the color image of chicken meat. The image color was corrected using a multivariate linear regression model with reference to a color chart placed beside the meat. Polynomial models with different orders and parameters were tested. It was shown that a small number of elements in the model were unable to correct the color well, while a large number of elements increased the calculation burden. After experiments on a set of sample images, a second-order polynomial model of nine parameters was adopted. Moreover, a new color card localization method without complicated computation was proposed with steps of global three-channel thresholding, contour extraction, and color block localization (Figure 2a). Kim et al. [55a] proposed a smartphone-based colorimetric pH detection method Figure 12. Experimental setups for measuring the spectral sensitivity of smartphone cameras using a monochromator and a spectroradiometer. Reproduced with permission. [133] Copyright 2021, MDPI.
www.advancedsciencenews.com www.advintellsyst.com using a color adaptation algorithm performed in CIELUV color space. A 3D-printed mini light box and a paper-printed color chart were prepared. A third-order polynomial model of 20 parameters was used for color correction. They used the minimum absolute deviation solution obtained by iterative reweighted least square to solve the parameters. Similarly, an automatic color-retrieving algorithm based on contour extraction was also proposed to obtain all the color values of the captured image automatically (Figure 2c). Zhang et al. [143] combined images captured with a smartphone-based microscope with those captured by a lens-free holographic microscope to generate color images of specimens with high resolution. A polynomial regression model was used for color correction ( Figure 13). Specifically, the images were first normalized and white balanced using an empty calibration image taken without a sample. The color information was then transformed from RGB color space to LAB color space. A fitting function used for lightness correction or gamma correction was calculated according to the L component of the output and the L component of the ground truth. The saturation of the image was enhanced to match the ground truth by appropriately scaling the chroma component (C ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ). The scaling factor was calculated in the least-square sense. Finally, a second-order polynomial with 12 parameters in LAB color space was used for color correction. Cugmas et al. [42] supplemented a typical veterinary teledermoscopy system with a conventional color calibration procedure and studied two midpriced smartphones in evaluating native and erythematous canine skin color. It was claimed that comparably accurate colorimetrical measurements could be achieved with a simple image normalization on the white surface without the need for time-consuming full-color calibration. They also used the linear regression model for color correction but with first-order and only four parameters. They did not recommend using a high-order polynomial with a small number of measurements because it could easily lead to overfitting. Similarly, Cebrián et al. [144] tested three different colorimetric analyses using different types and brands of smartphones with the assistance of a 3D-printed light shield box. They concluded that a linear least-squares fit was enough to achieve color constancy. Overall, imaging a color chart and performing color correction using a polynomial regression model is mostly used to reverse the second-stage image manipulation. The specific model and parameters may depend on different assays. A light shield box may help to simplify the correction algorithms and improve calculation efficiency. Some researchers also found that brighter images with higher contrast produced with longer exposure time but without inducing clipping can sometimes contribute to better color constancy. [42] Apart from using polynomial regression models, ML can also contribute to the color correction. One possible strategy for combining color calibration with ML is to design a small neural network that can determine the calibration function for each image. The general procedure for color correction is basically to find a 3D function f ðR, G, BÞ that can output the pixel value of each pixel after calibration. Suppose the ground truth pixel value of a pixel is R g , G g , B g , and the actual pixel value is R, G, B, and then the value of the calibration function at this point should be In linear regression-based calibration, the function f ðR, G, BÞ is assumed to be a linear function, while in polynomial regression-based models, the function f ðR, G, BÞ is assumed to be a polynomial function. If we consider the actual pixel values as the input and the ground truth value as label, we can train a neural network for each image that can have a calibration function with no fixed type. Considering the inconsistency in the camera's photosensitive elements and ISP, a flexible calibration method is theoretically better than a fixed regression method. For the ML calibration method to work, the number of reference colors with ground truth needs to be large, meaning the image needs to be taken with many standard colors from all possible colors the camera can detect. Also, to prevent overfitting, part of the dataset needs to be separated to form a validation set and stop the training process when the validation error is too large. By doing so, we could train a neural network that will calibrate one single image. The advantage of this approach is that the accuracy of color calibration should be better than regression-based methods. The drawback is that the model needs to be trained for every single image from a specific camera, which is not efficient. Thus, a unified feature mapping that is insensitive to color types may be a promising solution in the future, especially using some pretrained models.
Regarding the design of the color rendition chart, it could have different colors and a different number of patches. Patterns of the color chart can be optimized to fit different applications. Contrary to common sense that more color patches result in better color correction results, Akkaynak et al. [35] demonstrated that the transformation error stopped decreasing after the inclusion of the 18th patch. They also demonstrated that using patches Figure 13. An example of color correction using polynomial regression. Reproduced with permission. [143] Copyright 2016, Springer Nature.
www.advancedsciencenews.com www.advintellsyst.com whose radiance spectra span the subspace of those in the scene yielded the most accurate transforms. If the radiance spectra of the scene only span a small range of wavelength (e.g., in a forest that is sufficient in shades of greens and browns but rare in other colors), the normal color chart that is abundant in chromaticity but not specific in one color may act poorly in this situation. For most cases of colorimetric biosensing, the changes in color signals are small and only occur in one or two hues. There is no need to correct the colors out of the test region. Therefore, to increase the accuracy of testing results and also reduce the calculation burden, a color calibration target specific to the detection signal is usually developed. For example, Takahashi et al. [141] only used the color patches that have similar colors to the human skin and tongue for examining skin or tongue (Figure 14a). And Cugmas et al. [42] only used the white and skin patches on the Digital SG ColorChecker for skin color analysis (Figure 14b). Note that the inclusion of grayscale patches is necessary because they are important for WB and gamma correction while using grayscale patches solely is not enough to correct chromatic colors.
Manufacturing a color chart is not as easy as printing a colorful picture on paper. The surface of the color chart is better as a Lambertian surface, which means it reflects light equally from different angles. Especially, the glossiness should be avoided otherwise it would damage the correct color acquisition (Figure 14c). Most importantly, the reflectance of each color chart needs to be the same as a second color chart produced in different batches (i.e., the color chart needs to be reproducible).
There is research trying to correct the image color without using color charts. Escobedo et al. [145] performed color correction with only reference to a full black spot and a full white spot using the wrong von Kries model. The fitting function responded well to the Gamma correction curve. Hu et al. [146] used SVM to predict the illumination condition and the corresponding color correction matrix with reference to the color difference of images taken with/without a flashlight. It was assumed that taking an image with a flashlight would just be like imposing an intensity value on each channel of the image taken without a flashlight under the same environment. Nevertheless, the imposed intensity vector would be different if the images were captured under different illumination conditions. Therefore, the color difference between images captured with/without flash could be used to estimate the current lighting environment. Without the illumination estimation process, Lee et al. [147] leveraged the difference between two Figure 14. Color checkers used for different applications. a) Designed color checker for tongue color analysis. Reproduced with permission. [141] Copyright 2020, International Society of Artificial Life and Robotics. b) Skin patches selected for skin color analysis (in red square). Reproduced with permission. [42] Copyright 2020, MDPI. c) A bad case of color checker imaging. Reproduced with permission. [142] Copyright 2020, IEEE.
www.advancedsciencenews.com www.advintellsyst.com images captured under auto WB mode and a preset mode, respectively, to derive the color conversion function. Although those methods may work well for illumination correction, their performance on different smartphone models was not tested.

Conclusion and Future Perspectives
Smartphones are ubiquitously available nowadays. Combined with its merits of low cost, high portability, and high versatility, it shows great advantages over cumbersome laboratory equipment in the fields of telemedicine, e-health, environmental monitoring, or any scenarios of point-of-need detection. Especially, smartphone-based optical detection systems have been of increasing interest in recent years due to their simplicity in colorimetric signal measurement. However, most of the studies only focus on biosensing technology development and slightly on the difficulties in accurate signal acquisition in different environments. The color of images captured by smartphones is highly susceptible to illumination conditions, camera spectral sensitivity, and ISP pipeline. A small error in color information could lead to an unacceptable concentration error for the analyte, depending on the sensitivity of assays. Achieving color constancy is therefore of great importance. In this review, we summarize and discuss four types of recently developed methods targeting color constancy: 1) adapter attachment, 2) CST, 3) ML, and 4v) color correction. The adapter effectively excludes ambient light and provides a uniform and homogenous light source. It ensures the same illumination condition and reduces color variation to a great extent. The development of a miniature adapter and an APP with a color-retrieving algorithm is highly applicable to real cases. Selecting an appropriate color space can somehow improve the biosensor performance. For example, it has been verified that hue is more resistant to illumination variations and sensitive to chromatic changes, and LAB color space is more sensitive to small color changes than RGB. ML is a potential candidate for improving detection capability, which has been adopted recently to overcome illumination variation and interphone variation. Hitherto, ML is mostly performed on computers or cloud servers, on account of the relatively low-computing power of smartphones. Because ML-based colorimetric biosensing is mostly based on classifications, an extensive amount of training data is required to obtain results with high precision. ML could significantly increase the cost of POC diagnosis and may not be suitable for resource-limited regions and the developing world. Nevertheless, with the development of high-performance smartphone processors and ML algorithms, it is anticipated that smartphones will be able to perform ML with high speed and low cost in the near future. For example, there have been many lightweight networks specially designed for mobile phones, including MobileNets, [148] EfficientNet, [149] and MixNet, [150] all of which perform well on mobile devices. To further improve the detection limit using ML, efforts should be made in an attempt to build better datasets and to design more suitable model structures.
In the case of smartphone-based optical sensors, an ideal dataset should have images from different models of smartphones in different light environments. The dataset should also be augmented with CSTs and shape transformations. This will not only increase the size of the dataset but also make the readout insusceptible to the types of smartphones. As for the model design, current studies mainly modified the existing models, which may not be optimal for the task of readout of POCT biosensors. The semantic context of the image should be taken into consideration when developing a neural network model. In the case of POCT biosensor images, the features are mainly in low levels, [151] and the region of interest (ROI) is usually in fixed positions. The design of U-net [152] considers the characteristics of biomedical images, and it performs better than other structures in biomedical image segmentation tasks. If a new model is designed specifically for the readout of POCT biosensors, it may improve the detection limit of current methods. Last but not least, color correction based on a color chart showed the greatest promise in the elimination of illumination variation and interphone variation with a relatively simple algorithm and high effectiveness. To further improve the detection limit, color correction method could be combined with ML algorithms. Current color correction algorithms are based on a fixed calibration model, which assumes the relationship of a pixel value before and after calibration fits linear or polynomial functions. Each pixel value of a photo is determined by light intensity, the characteristics of photosensitive elements, and the ISP, so a fixed model may not be suitable for all models of smartphones. If ML is applied in color calibration, it could generate a nonlinear mapping function for color correction, which maps all types of color into a unified feature space. Thus, it may eliminate the influence of camera inconsistency and improve detection accuracy. Moreover, the raw data after stage one of ISP should show higher color constancy than what goes through the second stage of ISP. However, most studies were focused on the reversal of the second stage due to the lack of access to raw image data. Although there was some research targeting the improvement of the first stage of ISP, few studies used the raw data of the camera for biosensing purposes. The summarized four strategies can be blended with each other (e.g., using an adapter and color correction together ensures much higher accuracy of image color than using color correction alone [144] ) and better performance of smartphone-based optical detection system is expected in the near future. Although smartphone-based POCT technology has the advantages of being rapid, low cost, and easy popularization, its performance in terms of detection sensitivity, repeatability, and throughput awaits for further improvement compared to the automatic detection platforms in laboratory. Besides these emerging technologies for optical signal outputs, advances [153] in ultrasensitive assays (such as CRISPR/Cas biosensing, [154] nanomaterials-facilitated biosensors, et al.), printable biosensors (such as standardization of biosensor fabrication and preparation), and microfluidics (such as sensor arrays) will be beneficial to achieve desirable performance for smartphone-enabled point-of-care diagnostics.