Temperature-Robust Learned Image Recovery for Shallow-Designed Imaging Systems

comparison between All Hybrid and Branch Discrete or Branch Hybrid suggests the inadequacy of the blind mixture and the necessity of applying multiple branches. With leading to the best PSNR and comparable LPIPS in quality of recovery, Branch Hybrid further reveals the superiority of the multibranch strategy with appropriate temperature division and mixture. Bold values


Introduction
The imaging behavior of conventional imaging systems can be affected by a handful of environmental factors. For instance, robots or drones with machine vision cameras are widely applied in polar or desert scientific expeditions, where unexpected sharp temperature variation may alter the refractive index and thickness of their lenses, as well as the sensor response. The resulting defocus blur [1] and thermal noise [2] may dramatically degrade the fidelity of acquired information. Further, in daily life, most low-cost imaging systems such as vehicle or phone cameras depend on ill-conceived optical designs and postprocessing algorithms, resulting in poor thermo-optic effects. [3,4] Due to the intensive usage of plastic lenses, this problem is becoming increasingly critical. As traditional approaches [5] to handling thermooptic effects show defects that limit their decentralization to compact machine vision cameras, we develop a temperature-robust learned image recovery scheme, predominantly inspired by emerging machine intelligence advances.

Temperature-Robust Optical Design
Athermalization techniques aim for realizing a stable focal power within a wide range of temperatures. Typically, there are three types: passive optical, [6] active mechanical, [7] and passive mechanical methods. [8] These solutions compensate for thermal defocus, respectively, by the combination of optical materials, thermal expansion of mechanical components, and motorized mechanical parts. However, mechanical methods are often insufficiently robust and unfavorable to miniaturization, while optical ones usually request various types of lens material and complex structural designs. Alternatively, emerging wave front coding approaches [9,10] alleviate the thermal defocus by increasing the depth of focus and applying proper decoding algorithms, but its viability is compromised by factors including narrow field of view, decoding noise, and the requirement of extra optical elements.

Learned Image Recovery
Intuitively, resolving high-quality images from degraded ones can be regarded as image recovery, which considerably mitigates blur and distortion due to the imperfect imaging and diffraction of the optical system, in tandem with possible sensor noise. When it comes to these issues, deep learning techniques, in particular convolutional neural networks (CNNs), have been proven more effective than conventional iterative deblurring and denoising algorithms. For example, the end-to-end optimization of CNN [11] allows for solidly holding back the accumulation of errors in the computing process. The DnCNN [12] leveraged residual learning and batch normalization strategies to handle additive white Gaussian noise and even blind Gaussian noise. Abuolaim et al. [13] presented a DPDNet using dual pixel data DOI: 10.1002/aisy.202200149 Imaging systems are widely applied in harsh environments where the performance of shallow-designed systems may deviate from expectation. As a representative scenario, environmental temperature variation may degrade image quality due to thermal defocus and sensor response, resulting in blur and noise. However, extensive athermalization in optics usually requires a complex design process and is limited by materials. Herein, a multibranch computational imaging scheme is developed, using emerging generative adversarial networks as the postprocessing to compensate for degradation of all kinds caused by thermal defocus and noise. In addition, a temperature controllable data acquisition, division, and mixture scheme is described to facilitate effective datasets for model robustness. Experiments on a vehicle lens and a mobile phone lens reveal that the proposed multibranch learned strategy notably increases image quality in the temperature range of 0-80°C, and outperforms conventional athermalization in most instances, which is beneficial to lowering the design and manufacturing costs of imaging systems.
to generate a clear image for handing defocus blur. Although a considerable amount of deblurring and denoising work has been reported yearly, [14][15][16][17] most of them focus on solving one of the problems without considering their joint effects at a wide range of temperatures. Yet, it is mandatory for resolving latent images to consider the peculiar optical properties [3] and sensor responses to temperature.

Domain-Specific Imaging
A premium recovery relies on a prodigious dataset where more data means more valuable features can be learned in latent space by the network. But concomitantly, datasets containing different scenes and characteristics often showcase various or even conflicting features, which may burden the model learning for domainspecific tasks, e.g., underwater color correction, [18] high dynamic range imaging, [19] and vehicle compound lens imaging. [20] Although there has been a denoising dataset (SSID) [21] and a defocus blur dataset [13] dedicated to task-specific imaging, they deviate from our goal in resolving clear images with both noise and defocus blur present due to temperature variance. Notably, these factors could vary dramatically subject to temperatures; thereby, a dataset acquired under a certain condition usually fails to incorporate all kinds of degradation. As such, it is valuable to facilitate a series of domain-specific datasets containing various corrupted features, for the sake of a robust reconstruction network.

Motivation and Contributions
We develop a temperature-robust image recovery strategy for shallow-designed imaging systems, to address both thermal defocus and noise corruption due to temperature instability. The following technical contributions are investigated: 1) Taking account the varying optical properties and sensor responses at different temperatures, we facilitate a series of datasets to train a robust multibranch reconstruction network model, where each branch corresponds to a unique temperature. 2) To reduce the redundancy of branch networks, we apply a flexible temperature division and mixture scheme to fuse branches according to the degradation levels, and further compress the scale of the network architecture. 3) Theoretically and experimentally, we validate the proposed strategy on two kinds of off-theshelf camera modules, enabling high-quality imagery at varying temperatures without using tailored optical designs or sensors.
The proposed strategy not only paves the way to alternative athermalization imaging solutions from the deep learning perspective, but also enables a division and mixture scheme to reduce model redundancy and computational expenses. With flexibility and robustness, it can promise numerous intelligent machine vision systems in photography, robotics, automatic vehicles, drones, medical devices, and other human vision and sensory applications.

Temperature-Robust Computational Imaging Framework
Regarding optics, the fluctuating temperature affects both ray refractive properties and surface form factor of optical elements, leading to a temperature-sensitive focal power and point spread function (PSF) behavior. Typical passive optical designs tend to compensate for such deviation by using a combination of different materials. However, it is often challenging to reach a great balance with limited material options under the additional premise of aberrations constraint and other inherent design metrics.
Concerning image formation, a degraded observation I can be expressed as where the PSF pðx, y, d, λ, TÞ is dependent on not only the spatial position ðx, yÞ, object distance d, and incident spectral distribution λ, but also temperature T. Q c is the spectral response of sensor color filters, and I c is the latent clear image. N represents the sensor noise that exponentially varies concerning temperature due to the electron thermal motion. As such, a shallow-designed imaging system records images corrupted with different levels of blur, aberration, and noise at various temperatures. Intuitively, it would bring about that an isolated reconstruction model only considering room temperature does not often work well in the postprocessing to resolve clear images at a wide range of temperatures. Therefore, we develop a temperature-robust multibranch image reconstruction model, seeking to yield output images with comparable quality regardless of temperature.
As shown in Figure 1, we present datasets obtained at different temperatures to the learned networks and train corresponding branches of the full reconstruction model. For boosting realistic visual effects, each branch model explores a generative adversarial network (GAN) [22] with a generator based on a variant of the U-net architecture. [23] In the test or application phase, the final model automatically selects the appropriate branch network with the temperature detector feedback to resolve a photorealistic image.

Lightweight, Multibranch Image Recovery Network
We seek to realize the temperature-robust high-quality imaging by leveraging a multibranch GAN-based model, with a risk of redundancy in a sense. For simultaneous lightweight and robustness, we further simplify the network architecture through base unit, as well as explore a proper scheme of dividing temperature ranges and mixing acquired datasets so as to reduce the number of branches.

Lightweight Base Network Design
In each branch of the reconstruction model, the GAN attempts to minimize the weighted combination of adversarial loss and perceptual loss [24] to recover degraded images robustly at different temperatures, which can avoid overly blur caused by the pixelwise average of possible optima. [25]

Adversarial Loss
Instead of using a traditional adversarial loss, [22] we apply one of the variants of Wasserstein GAN [26] to minimize the Wasserstein distance between predicts and ground truth images for a robust training process. The adversarial loss function can be expressed as where P g and P r represent distributions of model and data, respectively. In the training, the adversarial loss aims to minimize the structural deviation between the generated image I and the real image I, penalizing missing structures.

Perceptual Loss
The perceptual loss encourages outputs to have similar high feature representations with the ground truth images, which are extracted from the 15th layer of the pretrained VGG19. [27] Minimizing the distance of Gram matrices between the generated images and references, the loss function can be defined as follows A strong resemblance between high feature representations indicates that the generated images exhibit a high visual quality similar to ground truth.

Network Depth Fine-Tuning
In a sense, the suitable depth of the branch network should be determined by the degradation level. We note that a lightweight model is capable of alleviating slight degradation at room temperature but may not work well at extreme ones. In each branch of the reconstruction model, one can adopt the generator consisting of different numbers of network blocks subject to degradation level. This insight is conducive to further reducing the computational expense because the extreme-case degradation is usually rare in the wild. Notably, although it may be in practice clever to adopt more network blocks in extreme temperature models, in the following experiments we use the same network architecture (see Section 4.2.1) in each branch model for a fair comparison of recovery performance subject to temperature. In addition, rather than diving into advanced network architectures, it may be more persuasive to adopt a fundamental, wellestablished architecture to reproduce high-quality images by the superiority of the proposed strategy. The specific network architecture can be found in Section 2 of the Supplement.

Temperature Division and Data Mixture
As it is superfluous and unaffordable for hardware to specify a model at each temperature, we explore a temperature division and data mixture scheme to reduce the number of branches. Taking a vehicle lens module without the athermalization design as an example, we first assess the blur level of its measured images at uniformly sampled temperatures (5°C apart) by peak signal-to-noise ratio (PSNR) between the ring-split convolution simulation results and ground truth (see blue curve in Figure 2). In the simulation, we split the ground truth into a sequence of concentric rings to match PSFs in the corresponding field of view, and then convolve them with corresponding PSFs and add the Gaussian noise to simulate sensor responses.
To further confirm the necessity of the multibranch strategy, we apply the basic network trained on the dataset at 20°C to recover blur images at all sampled temperatures (see Net-20 in Figure 2). We note that it behaves well in low-and mediumtemperature ranges where the deviation of blur level is relatively flat, but exhibits a poor performance at high-temperature range where the blur level varies acutely. As such, we divide temperature ranges subject to the PSNR distribution of blur images. When the PSNR drops beyond a certain threshold, marked as ΔPSNR th , we set a demarcation and assume that there exist nonnegligible differences in blur images between two adjacent ranges, which are unsuitable to be recovered by the same model. In this work, we set ΔPSNR th as 1 dB and divide 0-80°C into four ranges marked from R1 to R4. Noted that ΔPSNR th is adjustable and its precision depends on the requirement of the application. Intuitively, a smaller threshold would result in more precise results, while burdening the computational memory due to creating more branches.

…… ……
Eventually, we blend the blur images of sampled temperatures in the same range and construct a hybrid dataset to train a robust branch model competent for the corresponding range. Taking the R1 as an example, we sequentially assign 68 images in nine sample temperatures to form a training set with 612 data pairs, aka, the total scale of the training dataset. The superiority of this trick is verified in Section 4.2. Notably, the currently divided ranges guide the initial training process but do not determine the number of branches for the final recovery model, i.e., adjacent ranges can be merged and retrained given that their specific models perform similarly or the hardware resource is strictly limited. The detailed scheme can be found in Section 4.1 of the Supplement.

Temperature-Controllable Automatic Data Acquisition
As shown in Figure 3, for acquiring substantial datasets automatically, we built a display-capture setup [28,29] including a high-resolution, calibrated LCD monitor (Asus PA32U) and a shallow-designed phone module (Samsung GN1). Apart from the benefit of acquiring massive datasets without manual effort, it also considers the real noise and blur in both optical design and image signal processing (ISP) together instead of only dealing with one part of them. Therefore, the model trained on the datasets acquired from such a setup has more potential to generalize to complex real-world environments such as extreme temperatures.
To acquire a series of datasets containing accurate temperature features, we build a chamber with the control installation for heating and cooling. The chamber imitates the temperature of environment where the camera is placed, ranging between 0°C and 60°C, with a fluctuation (1°C) much smaller than the specified one. During the shooting, the chamber maintains the temperature at a desired constant value, and the camera captures images shown on the monitor through a piece of hightransparency white glass. Besides, an extra temperature detector (DS18B20) is installed by the camera for precise temperature feedback to the control PC.

System Specification
We assign 612 images as our training dataset and 50 ones as the test set from the Adobe 5 K collection. [30] For robustness, the distribution of the test set should not be in line with those used for the temperature division in Section 3.2. Implemented with PyTorch, the training process of the recovery networks takes (c) Figure 3. Temperature-controllable display-capture setup. A chamber is adopted to offer a constant temperature environment for the phone module, which is responsible for capturing the dataset displayed on the monitor. In the testing phase, the detector gets back the temperature feedback in real time to the control PC for a suitable branch model.
www.advancedsciencenews.com www.advintellsyst.com 250 epochs on a single graphics card (NVIDIA GTX 3090), and the simplified architecture allows them to run on any GPU with a memory greater than 6 GB. In the training phase, we crop the 512 Â 512 resolution patches from the blur dataset and their corresponding GT with random flipping and rotation for data augmentation. Adopting the well-known Adam optimizer, the learning rate is initially set to 10 À5 and linearly decreased to 0. During each iteration, the discriminator D is updated 5 times while the generator G is updated once.

Simulated Results
For intuition and clarity, we utilize two kinds of vehicle lenses to demonstrate the feasibility of the temperature-robust strategy from 0°C to 80°C. One of them with athermalization could provide stable-quality images at different temperatures, while the other suffers from the thermal defocus at extreme ones due to the shallow optical design. For training and test sets, we resize the GT images to 1920 Â 1080 to match the common sensor specifications of these vehicle lenses. We obtain PSFs at various temperatures through the thermal analysis function of the OpticStudio [31] and simulate blur images via convolution operations. For each branch model, we report both PSNR and LPIPS [32] of recovery images and assess by average values, where PSNR exhibits the pixel-wise performance while LPIPS represents the similarity of high-level features. Extra assessment using the structural similarity (SSIM) [33] can be found in the Supplement, so is the experiment at extremely low temperatures (below 0°C), which, again, robustly manifests the validity of the proposed scheme.

Ablation Study
To demonstrate the effectiveness of the scheme proposed in Section 3, we conduct several ablation experiments concerning lightweight architectures, multibranch scheme, and temperature division and mixture. For brevity, we refer to these models trained under different strategies as Branch Discrete, Branch Hybrid, and All Hybrid, respectively. Each dataset for Branch Discrete is obtained at only one temperature, while that of Branch Hybrid mixes multiple images under the same divided range. In contrast, All Hybrid simply blends all datasets within the entire temperature region without branches. Above datasets exhibit the same scale but different temperature distributions.
Lightweight Architecture: We note that the depth of the recovery network depends on the degradation level, and images captured by the vehicle lens do not require such a deep network architecture, [29] termed as the Original model. Therefore, we can reduce the number of network blocks in the generator of the Original and construct a lightweight architecture as Current model. As shown in Table 1, trained on the same dataset from All Hybrid strategy, the number of parameters to optimize in Current is significantly reduced to only one-eighteenth compared to the Original; in the meanwhile, the image quality does not drop noticeably and even improves at several ranges. This significant reduction in parameters offers a possibility to reproduce high-quality images using microprocessors that already exist in many imaging systems.
Multibranch Scheme: The opposite of the multibranch strategy is mixing all datasets under the entire range without considering their temperature deviation, i.e., All Hybrid model. As shown in Table 2, regardless of PSNR and LPIPS, the performance of the multibranch models, Branch Discrete or Branch Hybrid, exhibits better than those of All Hybrid over almost the entire temperature range. Experimental results illustrate that the blind mixture may induce the crosstalk of high features of different corruptions in latent space and burden the learning of the network architecture. And it could be counterproductive to account for thermal corruption by roughly constructing a dataset containing all kinds of degradation. Therefore, the multibranch scheme is beneficial for addressing the degradation at a wide range of temperatures.
Temperature Division and Mixture: With the comparison between the Branch Discrete and Branch Hybrid shown in Table 2, the PSNR values of the Branch Hybrid recovery increase noticeably and outperform other models at each temperature. Although the Branch Hybrid model cannot achieve the best LPIPS at all temperatures, it still produces comparable values to Branch Discrete in most ranges and significantly outperforms All Hybrid. With the significant reduction of the branch number, as well as yielding the best PSNR and comparable LPIPS, our multibranch model with appropriate temperature division and mixture is proven the promising scheme to deliver high-quality images at a wide range of temperatures.

Temperature-Robust Image Recovery
As the temperature increases, the textures and outlines of objects become indistinguishable beyond 60°C, such as stone carvings on the wall, edges of tree branches, and patterns on the lifebuoy in Figure 4. Benefiting from the division and mixture scheme, images generated by our multibranch model represent superior visual quality in clarity, sharpness, and details as opposed to the blur ones. In addition, as shown in Figure 5, the PSNR values of images recovered by our model all exhibit over 2.0 dB higher than the blur ones at 0-80°C. Meanwhile, we report the recovery results below 0°C, revealing that our multibranch strategy is compatible at extremely low temperatures. The detailed experiments can be found in Section 4.2 of the Supplement. To conclude, the proposed multibranch model significantly improves Table 1. Ablation study of models with different parameters. Numbers indicate the number of the parameters to optimize. We reduce the number of network unit in original model as current model. The PSNR and learned perceptual image patch similarity (LPIPS) metrics showcase that when the number of the parameters drops dramatically, the quality of the recovery does not decline quickly and even increases at some ranges. the quality of corrupted images, both visually and quantitatively, and makes them as clear and photorealistic as possible.
Although the traditional passive optical athermalization offers relatively stable image quality over the entire temperature range, it trades off the performance at 0-40°C, which is the most commonly used perception scope in daily life. By comparison, our model produces better results at À20 to 60°C, which can cope with most application scenarios. We note that the recovery PSNR values are slightly inferior to athermalization beyond 60°C, where the multibranch model still improves the blur images by 2.36 dB on average.
Notably, we adopt the same network architecture, hyperparameters (e.g., learning rate, feature channels, etc.), and scale of the training dataset for a fair comparison at all temperatures, leading to inferior results than athermalization at extremely high temperatures. In Section 4.3 of the Supplement, we further explore the insight of fine-tuning parameters to enable resolved images to outperform those captured by athermalization at extreme temperatures. For instance, with the data augmentation, our multibranch model could significantly increase the improvement in PSNR, marked as the yellow star shown in Figure 5. Nevertheless, our multibranch strategy is an effective yet simple solution to free optical systems from the troublesome design process.

Real-Capture Experimental Results
To verify the practicability of our strategy, we test on a phone camera module consisting of several plastic aspheric lenses which lead to unneglectable thermo-optic effects. For acquiring massive datasets at different temperatures, we introduce the finetuning display-capture setup where the module is fixed in the chamber (see Figure 3). Based on the daily usage of phones, we adopt 0-60°C as the experimental region and calculate the PSNR of the blur test sets 10°C apart. Analyzing the PSNR-temperature plot, we divide this range into three subranges of 0-10°C, 20-50°C, and 60°C. Subsequently, we blend the datasets inside  Figure 4. Recovery results of the multibranch model at different temperatures. Although the blur (row# 1, 4) aggravates as the temperature moves away from the design one, the recovery results (row# 2, 5) are relatively robust and exhibit similar to or even slightly better than the images directly obtained by the vehicle lens with athermalization (row# 3, 6).
www.advancedsciencenews.com www.advintellsyst.com the subranges and train a model separately to combine the final multibranch reconstruction model. As noted, the display-captured setup also records the thermal noise in the dataset, which is consistent with the actual situation (see division details of the phone module in Section 3 of the Supplement).

Display-Capture Verification
We, respectively, reserve 50 display-capture images at different temperatures as the test set. From the results shown in Figure 6, we observe that the image quality drops significantly when the temperature deviates away from 20°C, especially at 0°C and 60°C. Through our multibranch model retrained on the display-capture dataset, the PSNR shows an increase of 0.93 dB on average and 1.5 dB at extremely low temperatures. The LPIPS also showcases the same tendency with an average drop of 0.09. Visually, the recovered images reproduce more identified details even at high frequency, e.g., the pattern on the parrot's face and the zebra crossing on the overpass in Figure 6. In addition, the captured images exhibit a bit of color bias due to the limited spectrum of the LCD monitor and built-in ISP. Our model also corrects tone automatically and leads to more photorealistic results.

Generalization Analysis
Although images displayed on the monitor cannot faithfully match reality due to the limited dynamic range and spectrum of the screen, our model trained on such datasets still works in real scenes to some extent. We demonstrate the generalization with the recovery results in several outer and indoor scenarios at the low, mid, and high temperatures, as shown in Figure 7.
When captured in the wild, the detector delivers the temperature feedback for automatic branch selection. Experimental results suggest that our model can perform fairly well in practice, e.g., clearer textures of the stone lion, sharper edges of the succulent plant, and more comfortable color of the overall images.

Limitations and Future Work
We are aware that several limitations in our work hinder us from further exploring the temperature impact on shallow-designed imaging systems. For instance, it is straightforward to facilitate the image degradation at extreme temperatures by simulation,  Figure 6. Experimental recovery results based on the dataset acquired from the display-capture setup. Extremely low or high temperatures aggravate image degradation while our model still realizes relatively robust high-quality results with details and true color. The labels illustrate the average improvement in PSNR and LPIPS of images at different temperatures.
www.advancedsciencenews.com www.advintellsyst.com but cumbersome and unruly in the real world. During the time the temperature rises in the chamber, lighting conditions and content features of the scene are likely to change, leading to the indistinct comparison under different temperatures. In addition, the depth of field brings blur bokeh which shows similar visual effects with defocus blur, raising difficulties for our model to recover. The further discussion can be found in Section 4.4 of the Supplement. In the next step, more advanced network architectures may be worth exploring to address blur images with excessive degradation. To this point, our model can still be valuable in constructing complex but efficient image recovery networks at extreme temperatures, and remaining lightweight architecture at room temperature regions due to the independent branches. Meanwhile, the enormous network architecture requirements urge us to seek an acceleration for the network processing, which is crucial for applications like vehicle cameras. Last but not least, the proposed temperature-robust imaging scheme can be elaborated to combat with the impact of other damp or dusty environments, where there is to date no proper metrics, through emerging transfer learning strategies.

Conclusion
We have developed a temperature-robust learned image recovery for shallow-designed imaging systems and verified its superiority both theoretically and experimentally. To tackle feature crosstalk due to temperature deviations, a series of datasets are facilitated to train the corresponding recovery network as a branch of the full model. To reduce computational expenses, a scheme of temperature division and mixture of multiple datasets is particularly exploited. Experimental results showcase the feasibility of our temperature-robust learned model, which outperforms traditional athermalization approaches both visually and quantitatively. Overall, this is a novel approach for the athermalization technology from the emerging deep learning perspective. Without complex optical designs and expensive materials, it allows shallow-designed imaging systems to acquire high-quality images at unconventional temperatures and lowered manufacturing costs.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.