Learning an Empirical Digital Twin from Measurement Images for a Comprehensive Quality Inspection of Solar Cells

Measurement images of solar cells contain information about their material‐ and process‐related quality beyond current–voltage characteristics. This information is currently only partially used because most algorithms look for human‐defined image features or defects. Herein, a purely data‐driven method is proposed to derive the essential image information in terms of the electrical quality within a comprehensive and meaningful representation. This representation is denoted as the empirical digital twin of the cell. Using it, solar cells can be classified according to their defects visible in the measurement images. For this purpose, a human‐in‐the‐loop approach to efficiently develop a classification scheme is presented. Therefore, a convolutional neural network combining various measurement data of a sample by correlating them with quality parameters is designed. The digital twin is an intermediate representation of the network capturing the quality‐relevant defect signatures from the images. Human experts can analyze this representation space to identify defect clusters that relate to different process errors, such as finger interruptions and shunts. How the representations are usable to derive sorting criteria for quality inspection is shown. Finally, how the empirical digital twin and the sorting scheme can be used for segmenting the defects without additional label effort is demonstrated.


Introduction
Imaging measurement methods provide valuable information for solar cell characterization and a number of methods, such as electroluminescence (EL), photoluminescence (PL), and infrared thermography (IR), are widely used inline as they are fast enough for current production lines. [1][2][3][4][5] They reveal material-and process-related defects, which are expressed by reduced radiative recombination. Examples include microcracks, shunts, edge isolation problems, and resistivity defects in emitters and metal contacts. These methods extend the classical characterization based on the analysis of current-voltage (IV) curves by allowing a spatially resolved evaluation of the cell and conclusions about the defect origins.
Despite the high information content of the measurements, the data cannot be fully evaluated even by experts. Due to overlapping defect structures and the high number of image dimensions, it is hard to estimate the influence of measurement features, be it a defect or even a "good" spot, on, e.g., the cell efficiency. This is currently done by experts or supported by the use of reference measurements. [6,7] For this, inline approaches are currently being developed. [8] At present, they require a special measurement setup and evaluation algorithms for image analysis.
There exist many ideas aiming at processing the measurement images, such as EL images mainly in the form of defect detection. Two main types of defect detection approaches can be found in the literature: 1) human-made heuristic filters and 2) end-to-end methods based on human-made labels. In the category of human heuristics (1), filters are developed by experts to either search for specific defect-typical structures (dark areas, finger interruptions, etc.) or extract features that serve as input to a machine learning (ML) model such as a support vector machine (SVM) [9] for further processing. In recent years, end-to-end approaches (2) have been increasingly published, in which the measurement images are mostly processed by deep convolutional neural networks (CNNs) or other ML models. The main difference here is that the filters are no longer developed by humans, but are empirically optimized based on a large database of images and labels. However, defect annotations by human experts are needed as a basis here.
Both approaches, based on image processing methods or on CNNs, have some disadvantages limiting their usefulness and applicability. One drawback relates to the label process and to the labels themselves. Labels as well as heuristic filters are error-prone, [10] time-consuming, costly, and use only a part of the information contained in the measurement images because they focus, e.g., only on the detection of one defect. Another disadvantage pertains to the usability of the results. The transferability to other process lines is limited, so relabeling is necessary. In addition, heuristics are required for the application of the results, e.g., process optimization or defect cause detection.
To address this issue, we propose the empirically learned digital twin of the solar cell and show how it circumvents the aforementioned problems and can be brought into practice quickly with little label overhead. Specifically our contributions are 1) we present a sensor fusion approach to learning an empirical digital twin (EDT) of the solar cell from multiple measurement images describing its current quality state; 2) we utilize the EDT for defect detection within a "human-in-the-loop" approach to reduce labeling effort while maintaining or even improving the detection rate; and 3) we show a weakly supervised approach for spatially resolved defect segmentation without further labeling work based on the human-in-the-loop approach.

Related Work
There exist many works focusing on the detection of defects in EL or PL images using classical image processing techniques. The algorithms are designed to detect microcracks and finger interruptions by advanced image processing techniques using EL images [11][12][13][14][15][16][17][18] or PL images. [19,20] For this, filtering methods are used, either to directly find the searched defect or to extract features, allowing a ML model like a support vector machine to perform defect classification.
With the success of deep learning models in image processing, they have also been applied to defect detection in solar cells. They usually involve expert defect labeling of a large dataset of images of cells or even whole modules so that a CNN can be trained for automatic detection. The latest works present CNN-based methods to detect microcracks, finger interruptions, or dislocation structures spatially resolved in EL images of cells or modules. [21][22][23][24][25][26][27][28][29][30][31][32] In terms of pure defect classification without segmentation determining if there is a defect, e.g., a microcrack or a finger interruption, many CNN approaches using EL and IR images of cells and modules are proposed. [33][34][35][36][37][38][39][40][41][42][43][44] To address the problem of a limited amount of data, a method based on generative adversarial networks (GANs) is demonstrated, which enables improvement of prediction results through artificially created EL images. [44] Both segmentation and classification approaches vary strongly in terms of the CNN architecture, dataset, data processing, and defect sought, so the quality of the predictions can only be compared to a limited extent. [45] The approaches described have high detection quality, but share drawbacks, which are addressed within our approach. Some CNNs have detection rates above 90%, showing that they are well suited to find labeled defect structures. However, one disadvantage is that the defects must be labeled beforehand by an expert. This process is time-consuming and correspondingly expensive. Furthermore, it has been shown that even experts have difficulties finding all defects and the human defect detection rate can vary strongly. [10,32] This causes the labeling process to be error-prone, and thus leads CNNs to partially identify false structures as defects. The disadvantage of the time-consuming labeling process becomes even more important when the algorithms have to be adapted to other cell lines. In this case, it is necessary to repeat the labeling for cells from the new line to achieve similarly high detection results. We try to overcome these problems by not predicting the defects themselves, but by predicting measured quantities and by this let the CNN learn the defect structures indirectly.
Apart from defect detection, there are other approaches that are more closely related to the one used in this article because they learn a compact representation by compressing measurement data via a deep neural network (NN) predicting quality or process parameters. Regarding quality rating of as-cut wafers, IV parameters are learned by means of PL images and visualized to evaluate their quality early in the production process. [46,47] A similar modeling approach is shown regarding the correlation of IV curves or process parameters and material parameters to obtain representations, which cluster into groups with similar properties. [48,49] A feature vector is retrieved by a pretrained CNN and subsequently used for defect classification. [50] In another work, features are extracted by expert-designed algorithms and afterward used to predict quality variables. [51] A further related approach involves the training of a CNN first to detect low-quality cells and then deriving feature vectors to perform bin classification. [52] 3. Approach

Overview
We propose three sequential algorithms for solar cell quality inspection using measurement data and expert knowledge. First, we derive a comprehensive representation of each solar cell by compressing the measurement data. We explain how this EDT can be derived only within a deep learning model in Section 3.2. The second algorithm brings the representation into production. We show an efficient way how the digital twin can be used for quality inspection by defining a classification scheme with the support of human expertise in Section 3.3. Finally, we combine both models to allow a defect segmentation without any additional labeling effort, which is presented in Section 3.4.

Learning the EDT
We obtain a meaningful representation by deriving features from the images that are expressive in terms of physical quality parameters such as IV parameters. As shown in Figure 1, we train a regression network to predict the quality parameters on the basis of measurement images. In general, the network consists of several sequential convolutional as well as pooling layers and rectifying linear unit (ReLU) activation functions. Thus, as the network progresses, the resolution of the images decreases whereas the semantic increases with respect to the predicted quantities. Our model can process multiple input data x i measured for each sample i. In this example, we design a deep learning model for EL images, IR images, and reflectance values. The CNN, denoted as function f θ ∶x ↦ y, is used for a multivariate regression of quality parameters, which are summarized in the vector y i . θ holds the model's parameters. By means of the regression, the dimensions of the image are reduced step by step and the measurement data are combined in a meaningful way with regard to the quality parameters. The last neurons of the CNN represent a vector ρ i holding the relationship between the measurement images and the quality output quantities. In the following, we denote this vector representation of the input data as the empirical digital twin of the solar cell.
With the EDT it is possible to quantitatively compare measurement images or even combinations of measurement images. In Figure 1, the fifth column shows the distribution of the features ρ j,i in ρ i for the two cells examined. j refers to the index within ρ i and i to the corresponding cell in the dataset. Here, the expression of image features ρ j,i can be compared in a few numbers. For sample 1 ρ 2 (cyan) and for sample 2 ρ 3 (yellow) are particularly low. For these features, the extreme examples of a considered dataset are shown below, revealing very different defect characteristics. They correspond to certain loss patterns, such as bad contact formation or finger interruptions. The loss patterns also occur within the two input examples, highlighted by cyan and yellow rectangles, thus having low values for the corresponding feature.
Several measurement images and data types can be entered into the model, from which the digital twin is calculated. In our implementation, EL and IR images are processed together with reflectance measurements revealing various defects. Figure 2 shows the architecture of the network. The two input images, i.e., the EL x EL and the IR image x IR , are first mapped from separate CNNs, i.e., f θ EL and f θ IR , respectively, to their own representation tensors P EL and P IR . These are subsequently concatenated with the reflectance values x refl ¼ ðr 390nm r 390nm Þ T at 390 and 950 nm wavelengths, which are scaled up to a two-channel tensor X refl of same spatial dimensions. The resulting tensor concatenation X cb is passed to the CNN f θ cb , which combines the information from Figure 1. Schematic illustration of the calculation of the EDT of the solar cell for two example cells. Input measurement images (first column) are passed to a convolutional neural network (second column), which is to predict the quality variables y (third column). The EDT ρ can be derived from the model (fourth column). Entries in ρ correspond to quality-describing image features ρ j,i , which can be compared with each other (fifth column). Figure 2. Overview of the model structure for sensor fusion. The EL and IR images are each passed to a model. The resulting intermediate representations P EL and P IR are concatenated with the reflectance values x refl ¼ ðr 390nm r 950nm Þ T to X cb . Then, the model f θ cb is supposed to combine the information into P cb , which is then given to further functions, one for each predicted parameter. As result, there are many vector representations for the predicted parameters (ρ V oc , …). When concatenated, they form the empirical digital twin ρ. For the prediction of the parameters the scalar product of the subrepresentations and a weight vector w must be calculated.
www.advancedsciencenews.com www.solar-rrl.com the different sources. The out-coming representation P cb is passed to further functions, one for the prediction of each output, e.g., f θ Voc and f θ Rp . From each of these, a part of the EDT ρ is derived, which is composed of ρ V oc , ρ R p , etc. The individual parts each have 40 entries in our implementation, which corresponds to m ¼ 720 entries for ρ with 18 predicted parameters. The partial representations are directly related to the predicted quantities because only the scalar product with the weight vectors w V oc , w R p , etc. is taken for the prediction. The weights scale the extracted image features in ρ V oc , ρ R p , etc. so that their sum yields the respective output quantities. The given θ represent the parameters of the CNNs that are optimized.
The prediction vector y contains 18 quality variables, which are listed in Table 1 in columns one and two, typical quantities such as the open-circuit voltage V oc , short-circuit current density J sc , fill factor FF, and efficiency η as well as specific parameters such as the pseudo-fill factor pFF, ideal fill factor FF 0, or their difference FF 0 À pFF. The different FFs and their differences are used to have quantities that break down the individual losses. For example, pFF contains no and pFF À FF mainly series resistance losses. FF 0 contains losses with respect to the saturation current density at the first diode J 01 and FF 0 À pFF parallel resistance and the saturation current density at the second diode J 02 losses. For the prediction of the parallel resistance R p , the natural logarithm is taken because R p has a logarithmic distribution and thus its influence would dominate during optimization. We have chosen a large number of quality parameters to obtain a comprehensive description of the measurement patterns. In general, however, it is an open question how many and which parameters are suitable for a good representation. Until further investigations, we suggest to use all the information accessible.

The EDT for Defect Classification with Human-in-the-Loop
We can use the EDT for quality inspection of solar cells by deriving a sorting scheme to classify samples into defect classes. When using measurement images for defect classification, it is necessary to have a large number of labeled images to ensure a CNN can learn the defect structures correctly. With the EDT, however, these structures have already been learned indirectly, so we assume fewer labels are needed. In addition, essential image properties are already compressed in the EDT, so there are significantly fewer dimensions and correspondingly less complexity, making it easier for a model to be optimized.
To efficiently incorporate expert knowledge into defect detection, we propose an iterative human-in-the-loop approach, also referred to as "active learning." The approach, visualized in Figure 3, consists of four steps: 1) find a small initial selection of EDTs that is representative for the dataset; 2) label them using the corresponding measurement images; 3) then train an NN for defect classification; and 4) apply the model to the remaining data to compute its uncertainty (the orange question marks in Figure 3) and pass the uncertain samples back to the expert  Figure 3. Schematic representation of the human-in-the-loop approach. After the EDTs of the cells have been calculated, 1) a representative initial selection in EDT space is chosen, which is then 2) labeled by experts. Based on this 3), an NN is trained toward a sorting scheme. Afterward, the uncertainty per EDT is calculated for the remaining dataset so that the most uncertain ones can be labeled in turn by the expert. Steps (2-4) are repeated iteratively.
www.advancedsciencenews.com www.solar-rrl.com for labeling only those. It is expected that the NN can learn the most from those uncertain samples. Steps (2-4.) can be repeated several times to efficiently optimize the NN. In the following, we describe the steps in more detail. For the initial and representative selection of some EDTs (1), we use a kmeans clustering [9] with n cl clusters. Then, for each cluster, the centroid is computed and that EDT ρ closest to the centroid in high-dimensional space is selected for labeling in (2) using the corresponding measurement images, respectively. For training and uncertainty calculation in steps (3) and (4), we use the uncertainty approximation based on dropout. [53,54] Here, the NN described by g∶ρ ↦ p d ∈ ½0, 1, gets the EDT vector ρ as input to predict the defect probability p d . Based on this, the entropy H can be calculated as in Equation (1) HðρÞ By applying dropout, [55] meaning with some probability some activations in the NN are randomly set to 0 (are dropped), and multiple submission of the same (ith) EDT ρ i , the result is an entropy distribution. As can be seen in Equation (2), we use the mean value of the entropy distribution as the uncertainty value U T is the number of inputs of the same EDT.

Defect Segmentation with Human-in-the-Loop
To perform a spatially resolved defect segmentation without any additional labeling, we propose to use the calssification scheme from Section 3.3 on spatially resolved feature maps in a weakly supervised fashion. The procedure is shown in Figure 4. For computing the EDTs, the global average is taken from the feature maps computed in the CNNs f θ Voc , f θ Rp , etc. Following, the sorting scheme can be learned based on the EDTs according to Section 3.3. The same sorting scheme can be applied per pixel of the feature maps before aggregation to the EDT has taken place. In this way, the defect area can be segmented in defect and nondefect areas at a coarse resolution. The condition for this method is that the defect-typical feature distributions must be similar for the global and the local representation. This should be given if a single type of defect or defect combination occurs in the image and dominates the classification training. If different defects are distributed over the sample, the spatially resolved representation of the sample deviates from the EDT.

Experimental Section
The dataset used herein consisted of 1600 Cz-Si industrially processed passivated emitter and rear cells (PERCs) of size 156 Â 156 mm 2 sorted out during production due to electrical and optical defects. Accordingly, they contained a large number of different defects, including shunts, finger interruptions, microcracks, overfired regions, scratches, poor contacts, and combinations of the aforementioned. The dataset was further randomly divided into three subdatasets: 70% of the cells were assigned to the training dataset, 10% to the validation dataset, and 20% to the test dataset. For each cell, EL and IR measurements were performed with a system from h.a.l.m. electronic GmbH. The cells were excited with 20 A, the integration time of the Si CCD camera was 50ms, and a gain factor of 3 was applied. In addition, reflectance at 390 and 950 nm wavelengths and the parameters listed in Table 1 were measured. A variation of DenseNet [56] was used as the CNN. A DenseNet consists mainly of a sequence of DenseBlocks and TransitionBlocks. DenseBlocks are characterized by a number of convolutional layers (length). Their outputs are passed to all following layers of this DenseBlock (not only the next one) so that they can access all previous information. The number of feature maps added per layer in the DenseBlock is called the growth rate. This structure can possibly be memory-consuming, which is why several DenseBlocks are connected by TransitionBlocks, which are supposed to compress the information of a DenseBlock by a reduction factor.
The paths for processing EL and IR images, respectively (f θ EL , f θ IR Þ, consisted of three DenseBlocks of length 5 and a growth rate of 24. The combining function f θ cb was a DenseBlock with the same parameters. The TransitionBlocks had a reduction factor of 0.8. The output prediction functions were each DenseBlock with length 6, growth rate 30, followed by 1 Â 1 convolution with 40 output channels (feature maps in Figure 4), global average pooling (resulting in the EDTs), and a linear layer for output regression.

Experiment 1: CNN Training and Prediction Quality
To investigate whether the quality parameters listed in Table 1 can be predicted, a CNN was trained as described in Section 3.2, which received EL and IR images and reflectance values as input. The images were scaled to a size of 224 Â 224 px 2 , standardized using mean and standard deviation, and randomly rotated by multiples of 90 and flipped horizontally and vertically. The model was optimized with an Nvidia GeForce RTX 2080 Ti for 250 epochs with a batch size of 40 on the training dataset. For this purpose, the Adam optimizer [57] and the mean absolute error were used, testing different learning rates between 10 À3 and 10 À4 and decays between 10 À3 and 10 À10 by means of a grid search. The learning rate was reduced by a factor of 10 during training by a plateau scheduler if the relative error on the validation dataset did not decrease over 20 epochs. Then, based on the validation dataset, the best model was selected and tested on the test dataset to ensure that there was high prediction quality before examining the EDTs.

Experiment 2: Explorative Analysis of the EDT
To investigate whether the EDTs are meaningful in terms of process defects, the entire data set was labeled. For this purpose, all measurement images were assessed by an expert so that the main losses were known for each cell. Some defects such as shunts, minor local hotspots, and edge isolation defects were well visible in the IR image. Others such as finger interruptions, overfiring defects, or scratch and stripe patterns were better seen in the EL image. Afterward, the EDTs of the whole dataset were computed and exploratively examined. Because the 720D EDTs cannot be viewed directly, a dimensional reduction by the t-distributed stochastic neighbor embedding (t-SNE) algorithm to two dimensions was performed exclusively for visualization so that groupings of EDTs could be examined. [58] Here, points that had a small distance in high-dimensional space were also close to one another in lowdimensional space. However, larger distances in the lower dimensional space cannot be directly compared with each other, so they have a lower significance.

Experiment 3: Defect Detection and Human-in-the-Loop
In this experiment, possibilities for defect detection as well as the human-in-the-loop approach were investigated. Based on the images, the complete dataset was labeled by an expert into the defect classes finger interruptions, poor edge isolation, overfired regions, and hotspots and shunts. Samples with multiple class assignments were possible. First, to generally investigate defect detection based on EDTs, a fivefold cross-validation was used to train an NN for defect classification. Subsequently, this supervised reference was compared to the human-in-the-loop approach from Section 3.3, also with fivefold cross-validation, to investigate how many labeled samples are necessary for a comparably effective defect detection. For this purpose, n cl ¼ 100 samples were selected initially and the same model was trained on them. Then, the uncertainty U of each unused EDT in the training dataset was calculated by Equation (2) with T ¼ 200, whereupon the 100 most uncertain samples were appended to the 100 initially selected samples and the NN could be fine-tuned. These steps were repeated until 1000 samples were reached.
After each iteration, the results were compared with those of the supervised reference. The following parameters were used to judge the prediction quality of the classification models: if a defect was correctly predicted, it was called "true positive" (TP) and if a defect was incorrectly predicted, it was called "false positive" (FP). If a nondefect was correctly predicted, it was called "true negative" (TN) and if a nondefect was incorrectly predicted, it was called "false negative" (FN). Some quantities can be derived from this. The precision is defined in Equation (3) and indicates how many defect predictions were actually defects.
The recall, defined in Equation (4), is a measure of how many defects were found of all defects.
The F 1 -score, defined in Equation (5), is the harmonic mean of precision and recall.
The accuracy is not used due to the unbalanced defect class distributions, which would lead to noncomparable results.

Experiment 4: Spatially Resolved Defect Detection
For the four trained NNs from experiment 3, some example cells were qualitatively examined with respect to the defect classes finger interruptions, poor edge isolation, overfired regions, and hotspots and shunts. A quantitative investigation is not possible because no spatially resolved labels are available.

Experiment 1-Results: CNN Training and Prediction Quality
The cell's quality parameters can be accurately derived from the measurement images of the cells by the trained CNN. In Table 1, the absolute error and the correlation coefficient between the predicted and measured value can be seen for all cell parameters. The correlation coefficients for most of the parameters are above 0.8 and sometimes up to 0.98. However, the short-circuit current density J sc and the grid resistance on the rear side R grid,re are www.advancedsciencenews.com www.solar-rrl.com exceptions. Therefore, in Figure 5 the predictions over the measured values are shown in a 2D histogram for these two parameters as well as for V oc and η. The respective number of samples per bin can be seen in the color bar and in the histograms at the edges. In an ideal prediction, all points would lie on the black diagonal. In (a) and (c) it can be seen that the prediction for V oc and η works well as for most of the other parameters.
The prediction for J sc in (b) also distributes around the diagonal. The distribution for R grid,re is also close to the diagonal, but there is a region at 13 Ω m À1 that is not well predicted, so the correlation coefficient is low in this case.

Experiment 2-Results: Explorative Analysis of the EDTs
The trained model has converged as described in Section 5.1, so the EDTs can be calculated and studied. The EDTs contain information about quality-relevant process defects. Figure 6 shows the low-dimensional embedding of the dataset. The x-and y-axes describe the values computed by the t-SNE algorithm and are unitless. Each point is colored according to its efficiency η. Several clusters can be identified showing a reduced efficiency and a cluster, which occupies a big amount of space. The clusters contain cells with the same defect type, which is highlighted by the annotations. The individual clusters are examined in more detail subsequently. The three smaller clusters in the lower right of Figure 6 contain finger interruptions, overfired regions, and stripes and scratches, and the cluster directly above contains shunts. The clusters clearly separate from each other and show reduced efficiency. In Figure 7, typical EL and IR images of the four clusters are shown on the left. Finger interruptions, overfired regions, and stripes are clearly distinguishable in the EL image, and the shunt can be seen in the lower left of the IR image. The EL image shows the microcrack that caused the shunt. The t-SNE graph also indicates a progression in efficiency within the clusters; e.g., worse finger interruptions are further down in the cluster than less severe ones.
The larger cluster in the upper right area contains cells with large-area defects, such as poor edge isolations and diffuse temperature distributions appearing in the IR image. Here, a progression in size can be seen so that larger edge defects are found further to the outside of the cluster, which is also reflected in lower efficiency. In Figure 7 in the first row of the right column, a typical example can be seen, where the left and upper edge of the cell is heated up in the IR image. Within this group are also samples with diffuse temperature distributions, which can be seen in the row below. In this case, the IR image shows an increased temperature over a larger area of the wafer.
The rest of the large cluster contains mostly smaller hotspots or mixed defects of shunts and overfired regions. There are cells that have overfired regions and shunts in conjunction, which can be found in a separate cluster. Example images are shown in Figure 7 in the right column row four. The large cluster on the left side contains smaller hotspots that do not have a large impact on efficiency. It splits up into two major regions, with the cells in the upper region having an increased bow. Above the cluster, there are three cells, in which the busbar region in the EL image shines brightly, which can be seen in Figure 7 in the right column, third row.

Experiment 3-Results: Defect Detection and Human-in-the-Loop
As a reference to the human-in-the-loop approach, four networks were trained in a supervised fashion, which can detect the (a) (b) (c) (d) Figure 5. Comparison of measured and predicted quantities. In the graphs, the measured value is shown on the x-axis and the predicted value on the y-axis. a) The V oc , b) the J sc , c) η, and d) the grid resistance on the rear side R grid,re . www.advancedsciencenews.com www.solar-rrl.com process defects finger interruptions, edge isolation defects, shunts, and overfiring defects. The small NNs consisting of three linear layers, each followed by ReLU activation functions and dropout, were trained to perform a binary classification of the mentioned defect types from the EDTs ρ received as input.
For the quantitative evaluation of the defect classification, a fivefold cross-validation was performed. In Table 2, the F 1 -score, recall, and precision for these defect types can be found in columns 2-4. Concerning the F 1 -score, best values are achieved for the defect class finger interruptions (0.972) followed by hotspots and shunts (0.959), poor edge isolation (0.861), and firing defects (0.762).
With the human-in-the-loop approach, described in Section 3.3, equal or even slightly improved results can be achieved with significantly fewer labeled EDTs. The same NN as from the supervised reference was used. Iteratively, 100 labeled EDTs were added to the training dataset based on their uncertainty UðρÞ from Equation (2). These results are also fivefold cross-validated. In Figure 8, the F 1 -score is plotted over the labeled samples for the targeted defect classes. The supervised reference (Sup. Reference) has the filled symbols for each defect, with 1250 labeled samples. A horizontal line showing the F 1 -score of the supervised reference approach is plotted for each defect as a guide to the eye. The empty data points represent the results of the human-in-the-loop approach.
This approach achieves comparable F 1 -scores as the supervised baseline approach based on only 200-400 labeled samples. The human-in-the-loop approach not only requires fewer labeled samples, but also provides better detection rates compared to the supervised reference. As shown in Figure 8, the F 1 -scoress of the human-in-the-loop approach stabilize above the respective supervised reference. The corresponding F 1 -score, recall, and precision values can be found in Table 2 in columns 4-6. For the respective defect types finger interruptions, poor edge isolation, hotspots and shunts, and firing defect, F 1 -scores of 0.991, 0.897, 0.963, and 0.782 could be achieved. On average, the F 1 -score could thus be increased by 2%.   www.advancedsciencenews.com www.solar-rrl.com

Experiment 4-Results: Spatially Resolved Defect Detection
The trained models from experiment 3 can be used for very distinct defects for local spatially resolved defect segmentation as described in Secion 3.4. Figure 9 shows four typical measurement images for the respective defect classes finger interruptions, firing defect, poor edge isolation, and shunts and hotspots. The region that the models from experiment 3 assesses as a defect is circled in red. Defects of the classes finger interruptions and firing defect can be coarsely detected in a spatially resolved manner. In Figure 9a, b, the defect regions are enclosed quite accurately in the EL images. In both cases, there is still a gap between defect and nondefect regions. Material defects such as in the upper left quarter of the cell in (a) are not missclassified as defects.
Edge isolation problems can be detected with spatial resolution, whereas a spatially resolved detection of shunts by the models from experiment 3 failed within our experiments. Figure 9c,d shows example IR images for edge-isolation defects and shunts. In (c), it can be seen that the edge isolation defect is well detected at the left edge, but there are further regions in the center and right corners that are detected as false positives. Spatially resolved detection of shunts fails using this approach, as can be seen in (d).

Discussion
We have shown that a CNN can derive quality parameters from measurement images so that the EDT can be trained to capture information from the EL, IR, and reflectance data. The influence of individual input data and predicted parameters on the representation has not yet been studied. For an exploratory analysis of the EDTs, the 720D feature vectors were visualized in a lowdimensional embedding space. Here, clusters were found containing cells with similar defects. This indicates that the EDTs are meaningful representations of the measurement images. The EDT can be used to quantitatively compare measurement images in terms of visible defects and cell properties. The network design, the selection of input and output data, and the size of the digital twin should be optimized in follow-up studies. In particular, the influence of the number of entries in the EDT should be examined. Also, the effect of the individual predicted parameters on the representation and how many and which ones are necessary for a meaningful digital twin should be investigated.
The digital twin of the solar cell can be used to derive sorting criteria for process defects. With the human-in-the-loop approach, an efficient method for defining sorting criteria was presented. The experiments showed a reduction in terms of labeling effort of a factor of 4-6 depending on the investigated defect. Within only a few interactions, the expert knowledge is integrated into a classification scheme and enables a user-specific sorting procedure. Surprisingly, the detection rate was even higher than that of the supervised reference. We suspect that the iterative sampling procedure of uncertain samples leads to a more balanced distribution of classes within our labeled data, reducing biases in the training data. Another advantage of this method is that due to the low effort, solar cell manufacturers can define defects individually tailored to the solar cell line and train them quickly because it is expected that the relevant information is contained in the EDT. The iterative approach allows the models to be continuously adaptable along the production process. We have used NNs in conjunction with dropout for defect prediction and uncertainty calculation; however, other methods such as SVMs are also possible as long as they can provide an additional uncertainty output. [59,60] The sorting schemes were applied successfully to classify the EDTs of solar cells into different defect classes. Also, a transfer to spatially resolved feature maps was investigated identifying local defect structures roughly without further labeling. For the defects finger interruptions and firing defect, we could achieve good segmentations. Edge isolation defects could also be found, with additional regions falsely detected as a defect. Shunts could not be detected locally. We suspect that the errors are caused by occurrences of mixed defects and that separate clusters in which the defect occurs dominantly are necessary for the approach. This is the case for finger interruptions and firing defects, but less for edge isolation defects and hardly for hotspots and shunts. Furthermore, at an excitation current of 20 A, series resistance defects such as finger interruptions and overfiring are easier to identify in EL images than parallel resistance defects such as shunts and hotspots.
Apart from the investigated properties, the digital twin of the solar cell promises to be easily transferable between multiple cell (a) (b) (c) (d) Figure 9. Example spatially resolved defect detection. The detected defect areas are circled in red for a) finger interruptions in the EL image, b) firing defect in the EL image, c) edge isolation defect in the IR image, and d) shunt in the IR image.
www.advancedsciencenews.com www.solar-rrl.com lines and helpful for process optimization. As the images are correlated with measured quality parameters, the time-consuming, costly, and error-prone labeling process can be bypassed. Thus, it can be quickly adapted to new cell lines by only performing the respective measurements before CNN training. As investigated, the digital twin holds process-related information derived from the images that goes beyond IV characteristics. Therefore, it seems promising using it for process optimization and by this incorporating image information into optimization.

Conclusion
We have introduced the empirical digital twin of a solar cell containing quality-describing features regarding electrical quality derived from measurement images. For this purpose, a deep neural network is trained to correlate high-dimensional measurement images with IV parameters, enabling the EDT to hold image features regarding these quantities. As an example of this general approach, we use electroluminescence and thermography images as well as reflectance values to predict in total 18 parameters. The digital twins can be used for quality inspection. Because similar cells lead to similar digital twins, they form in the high-dimensional feature space clusters of the same defect and quality type. By this, also changes in the processing of the cells can be made observable using measurement images. The EDTs are suitable for deriving sorting criteria. Within a human-in-the-loop approach, expert knowledge can be integrated into the defect detection process with little labeling effort. The efficient and iterative labeling process enables user-adapted sorting schemes specifically for the different cell lines and defects sought. It was demonstrated that for the considered defect types the F 1 -score could be increased by %2% to 0.99 for finger interruptions, 0.96 for hotspots and shunts, 0.90 for edge isolation defects, and 0.78 for inhomogeneous contact formation due to the firing process compared to the supervised reference. Finally, we have shown that the networks for defect detection can also be applied for spatially resolved identification of the respective defects without the need for further labeling.