PCB Soldering Defect Inspection Using Multitask Learning under Low Data Regimes

To increase the reliability of the printed circuit board (PCB) manufacturing process, automated optical inspection is often employed for soldering defect detection. However, traditional approaches built on handcrafted features, predefined rules, or thresholds are often susceptible to the variation of the acquired images’ quality and give unstable performances. To solve this problem, a deep learning‐based soldering defect detection method is developed in this article. Like many real‐life deep learning applications, the number of available training samples is often limited. This creates a challenging low‐data scenario, as deep learning typically requires massive data to perform well. To address this issue, a multitask learning model is proposed, namely, PCBMTL, that can simultaneously learn the classification and segmentation tasks under low‐data regimes. By acquiring the segmentation knowledge, classification performance is substantially improved with few samples. To facilitate the study, a soldering defect image dataset, namely, PCBSPDefect, is built. It focuses on the dual in‐line packages (DIP) at the PCB back side, DIP at the PCB front side, and flat flexible cables. Experimental results show that the proposed PCBMTL outperforms the best existing approaches by over 5–17% of average accuracy for different datasets.


Introduction
Quality assurance of the printed circuit boards (PCBs) and PCB assemblies (PCBAs) is vital for electronic product manufacturing.Instead of relying on laborious, costly, and subjective manual inspection, an intelligent automatic optical inspection (AOI) system can be employed to detect defects and aid human operators in decision-making.By utilizing such a system, the time required for inspecting soldering defects can be reduced, resulting in reduced human and time costs.However, traditional AOI systems for PCB soldering defect detection rely heavily on the quality of the acquired PCB images.[3][4][5][6][7] Besides, these traditional approaches are often computationally intensive which affects their real-time performance.The rise of deep learning has spurred the advancement and implementation of AOI systems. [8][11][12][13] Vanilla deep learning models are used and trained with thousands of PCB defect images.In fact, for most industrial-grade PCB manufacturing processes, soldering defects are not common.It will take a long time and huge manpower to collect sufficient PCBs with soldering defects to construct a dataset for deep neural network (DNN) training.It is particularly difficult for new production lines where the AOI system needs to be in place before production starts.The PCB samples that can be used for model training are rather limited.
In this article, we propose to adopt the multitask learning (MTL) method for PCB soldering defect detection under the low-data regime.Specifically, we propose to add another head to the model to segment the soldering points.Segmentation tasks are generally more difficult than classifications as they involve predicting the class of every pixel.We hypothesize that if a model is capable of achieving good segmentation of the solder regions in a PCB, it should have learned rich semantic features, leading to an improvement in the feature representation learning for PCB images, and ultimately enhancing the classification performance of soldering defects.Thereby, our strategy to address the low-data problem is to utilize the semantic feature knowledge obtained from the segmentation mask to assist the low-data training.While the segmentation task requires extra labels of soldering point positions, it may also be considered that the proposed approach tackles the low-data problem by using more labels in lieu of training data.
To verify the above idea, a new PCB soldering defect detection model called PCBMTL is proposed.The new model has a U-Net-like structure but has two heads for segmentation and defect classification, respectively. [14]For training the model, an image dataset with PCB soldering defects is needed.[17][18][19][20][21][22][23] For this reason, a new PCB soldering defect dataset, namely, PCBSPDefect is constructed. [24]The dataset contains images of PCB soldering points of three component parts, namely, dual in-line packages (DIP) located on the PCB back side (BDIP), DIP on the PCB front side (FDIP), and the flat flexible cables (FFC), as depicted in Figure 1.These components are typical in conventional electronic circuit boards.Ten classes of soldering defects can be found in the images.They are all typical soldering defects commonly found in PCBs.These images are captured from real defective PCBs.The dataset will be made publicly available for research purposes.

Related Works
AOI for soldering defect detection has been widely studied for decades.The early studies focused on traditional rule-based image processing techniques for detecting soldering defects. [1,2]hile they might use statistical or probabilistic techniques, they were all rule-based approaches.Even in recent years, there were still some rule-based methods used for detecting the defects of PCB through-holes and water pump PCB soldering points. [3,4]hese rule-based methods relied on handcrafted features, predefined rules, or thresholds, which were difficult to generalize and led to inferior performance in new applications.
In recent years, machine learning (ML) techniques such as K-means clustering, artificial neural networks (ANN), and multilayer perceptron (MLP) have emerged to become the main tools for classification tasks. [25,26][7] However, these approaches still use handcrafted features.In addition, ANN or MLP models are relatively shallow and may not be able to aggregate image representation features, which can limit the overall performance. [6,7]o solve the problem, deep learning techniques were developed and demonstrated to outperform traditional ML methods. [8]hey were applied to different defect detection tasks, such as detecting the defects of contemporary artworks and carbon fiber composites. [27,28]DNN techniques were also used in PCB soldering defect detection.For instance, object detection models were used for simultaneously localizing and classifying soldering defects. [29,30]A rule-based approach was used to crop out the soldering points, followed by a convolutional neural network (CNN) to classify whether the soldering point was normal or not. [9]CNN was also adopted for USB soldering point classification after localizing the USB connections. [10]VGGNet, a variant of CNN, was modified to classify abnormal solder joints. [11,12]or soldering defect classification in X-ray modality, 2D CNN and 3D CNN were designed for detecting defects in 2D and 3D X-ray images, respectively. [13]While these approaches achieve some success in defect detection, they require a large number of images (at least 4000) for training, which is hard to achieve in many practice situations.

Learning Approaches for Low-Data Scenarios
One of the main challenges of the abovementioned deep learning approaches is the need for large amounts of high-quality labeled data.In many real-world scenarios, obtaining such data can be difficult, time-consuming, and expensive.While it is a common problem for deep learning research, various approaches under low-data regimes have been developed to address this issue.One approach is pretraining or transfer learning.33] Rich feature representations of the data are learned and then adapted to a smaller dataset through fine-tuning.Although self-supervised learning (SSL) has shown promise in learning representations for downstream applications without domain data, the incurred high cost of the required computational resources as well as the long training time mean the solution will be expensive to the end users.While it is possible to use SSL pretrained models to avoid the training process, they are only available for a few popular ones.They may not be suitable for the requirement of PCB soldering defect detection.
Data synthesizing is another common approach for dealing with the low-data problem.Negative samples can be synthesized from the positive samples (which are assumed to be obtained more easily) through image processing or deep learning techniques. [34,35]However, the accuracy of the model will largely depend on the accuracy of the synthesizer, which is not available for general PCB soldering defects.Another approach is data augmentation to generate samples for training.This can be done by simple operations such as random cropping, rotating, and flipping image samples, or more sophisticated approaches such as Copy-Paste and using generative adversarial networks (GAN). [34,35]Data augmentation is commonly practiced in deep learning applications for reducing overfitting.When using it to solve the low-data problem, the data variability or the quality of the augmented images will not be high enough to replace the true soldering defect samples.
Multitask learning (MTL) is another approach that can be used under low-data regimes.In MTL, a single model is trained to perform multiple related tasks simultaneously, sharing and learning from common representations.This approach can leverage taskspecific information to improve the model's performance on each task, even with limited data.The shared representations learned by the model can capture common features across tasks and help regularize the model, reducing overfitting to the limited data.In this article, the MTL technique is applied to PCB soldering defect detection under low-data regimes.It is the first work that addresses the low-data problem in training PCB soldering defect detection models.When the available training data is only 10% of a normal dataset, the proposed MTL approach can still achieve 80% or higher detection accuracy, which significantly outperforms the conventional approaches.

Contributions
The contribution of this work has three folds: 1) We propose a novel PCB soldering defect detection DNN model namely PCBMTL.It adopts the multitask learning (MTL) approach that allows for improved classification accuracy using a limited number of PCB samples, even under low-data regimes.This is achieved by leveraging the acquired segmentation knowledge, which helps the model better understand the features and characteristics of the different components on the PCB.By training the model to perform both segmentation and classification tasks simultaneously, we can improve the model's ability to accurately classify defects, even if only a limited number of training samples are available.2) We developed a new PCB image dataset namely PCBSPDefect for soldering defect detection. [24]The dataset contains images of PCB soldering points of DIP and FFC components with 10 classes of soldering defects.This dataset provides not only image-level labels but also pixel-wise binary segmentation masks that indicate the regions where the PCB has solder.The dataset will be released for public download.
3) We have evaluated the performance of our PCBMTL model using the PCBSPDefect dataset.The results show that the proposed PCBMTL outperforms state-of-the-art methods under different amounts of training data.This indicates the potential of PCBMTL for effective PCB soldering defect classification using limited data.Additionally, the segmentation results can provide valuable insights into the model's decision-making process and can help identify areas for further improvement.

New PCB Soldering Defect Image Dataset
As mentioned in Section 1, there is no open-sourced PCB soldering defect dataset publicly available.To facilitate the study, a new PCB soldering defect image dataset was developed.The new dataset contains soldering point images of DIP and FFC components.The images were captured using a camera placed under a white color ring light, as illustrated in Figure 2a.A jig was used to fix the position of the PCBs, enabling easy cropping of BDIP, FDIP, and FFC soldering point images, as depicted in Figure 1 and 2b.
The soldering defects in BDIP are divided into four classes, namely, missing solder, insufficient solder, excessive solder, and bridging solder.In FDIP, there are two classes, including excessive solder and bridging solder.Whereas, for FFC, the defect classes are bridging pins, dirty pins, lifted pins, and shifted pins.Normal PCB images are also added to each category.Every image contains two soldering points or pins so that the problems of bridging solder, bridging pins, and shifted pins can be clearly seen.It also enables us to generalize the dataset to PCBs with different numbers of DIP and FFC soldering points or pins.The images were captured under three brightness levels, as displayed in Figure 2b, to simulate various lighting conditions in real environments.To generate the ground-truth binary segmentation mask, we utilized LabelMe annotation software to label the pixels with solder as foreground (white) and the remaining pixels as background (black).
Using the above setup, three datasets were created for BDIP, FDIP, and FFC, respectively, with around 6,000 images in total.Note that different data augmentation methods were applied to generate the images in the datasets.The details of each dataset are summarized in Table 1.To simulate the low-data scenario, a 20:20:60 split ratio is employed for the training, validation, and test sets, respectively.The exact numbers of the training images are shown in the last row of Table 1.To study the effect of training data size, the training set is further reduced by 2% incrementally down to 10% while keeping the validation and test sets unchanged.Each smaller training set is a proper subset of a larger training set, i.e. where are the training set D Train ð Þ with p and q percent of the total training samples, respectively (with p smaller than q).The sets are also balanced to ensure an equal proportion of different classes.To avoid overlapping, the same sample captured with different lighting levels is only included in one of the three sets.

Proposed Multitask Learning Framework
The main concept behind MTL is to extract and utilize the shared knowledge and information among multiple tasks so that the MTL model can learn multiple tasks concurrently, thereby enhancing the learning efficiency by exploiting the common features across tasks.It is adopted in the design of the proposed model PCBMTL to deal with the low-data problem due to the difficulties in collecting training samples for PCB soldering defect detection, as mentioned in Section 1.The proposed PCBMTL is an end-to-end CNN model that integrates classification and segmentation.This model is capable of taking an input soldering point (pin) image of BDIP, FDIP, or FFC, and generating two outputs: a binary pixel-wise segmentation map of the regions with solder and an image-level soldering defect type prediction.The model architecture depicted in Figure 3 is an encoder-decoder structure which is similar to some popular structures for image segmentation such as U-Net. [14]The proposed architecture comprises three parts: 1) an encoder for high-level feature representation learning, 2) a decoder or a segmentation branch for pixel-wise segmentation, and 3) an additional branch for classification.Specifically, given an image I ∈ ℝ HÂWÂC , where H Â W is the corresponding pixel-wise spatial resolution and C is the number of color channels, the proposed model predicts the segmentation map M ∈ ℝ HÂW and the corresponding image-level soldering class ŷ.

Encoder
The input images are gradually encoded into low-resolution highlevel feature representation by the encoder, as depicted in the left portion of Figure 3.Each level in the encoding branch contains two convolutional blocks, as described by a 3 Â 3 convolution layer (Conv 3Â3 ) with a stride of 1, followed by batch normalization (BN) and rectifier linear unit (ReLU) activation function in sequential order. [36,37]Unlike U-Net, our model employs BN, which is useful to stabilize the model training.Downsampling (Down) is achieved through the application of a 2 Â 2 max pooling layer with a stride of 2 between two levels, which reduces the  spatial dimension by 2 and increases the number of channels by 2. Max pooling is used because it is a simple operation that picks the maximum value of the pooling area from the previous layer, meaning that the most representative feature is selected. [38]The feature maps generated at the end of each level prior to downsampling can be expressed formally as where E l is the encoder features obtained at level l.BN and ReLU are omitted for the sake of simplicity.This procedure is repeated for L times in the encoder.
After passing the input image through the encoder, a total of Llevels of image feature maps fE l g L l¼1 are extracted, where L is set to 5. To perform soldering defect classification, a subset of E l levels is fed into the classification branch.When using MTL, the extracted features are also passed into the decoder to provide spatial information for predicting a segmentation mask of the regions with solder.

Decoder
In the decoder, the feature maps are upsampled to obtain the predicted segmentation mask M at the input resolution.As depicted in the right part of Figure 3, skip connections are used to concatenate the corresponding encoder features with the decoder features to obtain finer segmentation masks with more spatial details. [31] The fused features are then processed by two convolutional blocks to further enhance the feature representation where D l is the encoder features obtained at level l, and ⊕ is the concatenation operation.Bilinear interpolation is used for upsampling.The last layer of the decoder consists of a 1 Â 1 convolution layer, which reduces the number of channels and produces the predicted binary segmentation map M. The sigmoid function is applied to the output to ensure that the values lie between 0 and 1, representing the probability of each pixel that has solder.The operation can be described as follows where σ is the sigmoid activation function.Let Mi be the probability of having solder for the i-th pixel in M. Mi approaches 1 when the pixel has a high probability of having solder, and vice versa.We hypothesize that the segmentation knowledge of the pixels with solder can improve the performance of the classification task.Notably, when MTL is not employed, the decoder branch is not used for segmentation prediction.

Classification Branch
To classify the soldering defects, a classification branch is added to the end of the network, as shown at the bottom of Figure 3.
The semantic features learned from the segmentation task are extracted and fed as inputs of the classification branch.Specifically, the classification network receives feature maps of E LÀ1 , E L , and D LÀ1 , which are globally average pooled (GAP) as E LÀ1 , E L , and D LÀ1 .Additionally, the predicted binary segmentation map M can provide hints for the classification task as it indicates the predicted pixels in the image with solder.M is further processed using K consecutive 3 Â 3 convolutional and downsampling layers to obtain enhanced features with a smaller spatial size.
where K is set to 3, M 0 is the predicted segmentation map M, and the downsampling is implemented by a 4 Â 4 max pooling layer with a stride of 4. The resulting enhanced feature map M K is then flattened.
Finally, E LÀ1 , E L , D LÀ1 , and the flattened M K are fused by concatenation and fed into the classification branch, which has three fully connected (FC) layers to predict the soldering defect type.ReLU is used for first the two layers while softmax is used for the last layer. [37]A dropout of 0.5 is used for each FC layer. [39]The whole operation is described as follows where the first two FC layers reduce the feature size by half.

Multitask Loss Functions
To train the segmentation branch, two losses are used.The first one is the binary cross-entropy loss L BCE estimated between the predicted segmentation map M and the ground-truth segmentation map where Ω is the set of pixels and jΩj is the cardinality of Ω, i.e., the number of elements within Ω. Another loss is the Dice loss.Given two countable sets A and B, the Dice coefficient is defined as which is useful to measure the overlapping between A and B similar to Intersection over Union (IoU).It can be observed that Dice A, B ð Þ is maximized at 1 when A ¼ B and minimized at 0 when A \ B ¼ ∅.Based on the Dice coefficient, the Dice loss is defined as where ε is a very small value for avoiding numerical issues.Consequently, the segmentation loss is the sum of the binary cross-entropy loss and the Dice loss where λ D is a hyperparameter to balance L BCE and L Dice .For classification, the cross-entropy loss is used In the end, when MTL is used, the total loss is the sum of classification loss and segmentation loss, i.e.
where λ Seg is used to balance L CE and L Seg .The two hyperparameters λ D and λ Seg were selected empirically based on the validation sets of PCBSPDefect.We found that the network already worked well by selecting λ D ¼ λ Seg ¼ 1 although we believe that further finetuning them may lead to even better results.When MTL is enabled, the network is trained in an end-to-end manner by minimizing (12) so that for each input image, the network requires to predict the segmentation mask well and classify the defect correctly at the same time.The training details are described in the next section.Note that when MTL is disabled, only the cross-entropy loss L CE is used for model training, i.e., λ Seg ¼ 0. We can see from ( 12) that, to minimize the total loss, the network needs to learn well for both classification and segmentation tasks.However, our focus is on classifying soldering defects, and therefore, we aim to utilize the insights gained from the segmentation task to enhance the performance of the classification task.Additionally, the second term in ( 12) can be viewed as a regularizer that enables the classification task to be constrained by the segmentation task, which avoids the learning of redundant features.

Experimental Results
To evaluate the performance of the proposed multitask learning framework, we used our newly developed datasets BDIP, FDIP, and FFC.The implementation of the proposed model, PCBMTL, was carried out using PyTorch and trained from scratch for 50 epochs with a batch size of 16. [40] It is noted that the number of iterations equals the number of training samples times the number of epochs then divided by the batch size.The number of iterations is different for different training percentages.Adam optimizer was used with an initial learning rate of 0.001. [41]andom horizontal flipping was applied for data augmentation during training.As we formulate the problem as a classification task, classification accuracy was used as the evaluation metric.To ensure the accuracy and reliability of our findings, the average results from five independent runs are presented in all cases, which is a common practice under low data regimes.This is necessary because when models are trained using limited samples, with random initialization, the accuracy can fluctuate.Training and testing the model five times and taking the average accuracy across these runs allows us to obtain a more reliable and representative measure of the model performance.Furthermore, as shown in Table 1, the performances of the models at different training percentages, ranging from 10% to 20%, need to be evaluated to study the impact of training data size.Thus, the 5-run average accuracy of each model was measured for each training percentage.

Ablation Study of PCBMTL
For selecting different design parameters of PCBMTL, a comprehensive ablation study was conducted.Specifically, we performed ablation experiments on the BDIP dataset by varying the training sample percentage from 10% to 20% of the total.The model was trained using different sizes of training sets, and the correspond- , was measured.We also calculated the average accuracy, Acc where P is the set of training samples with varying sizes.In this case, p ∈ P ¼ f10, 12, 16, 20g.
Table 2 presents the results with different configuration settings.First, by comparing the configuration settings S1 and S2, we can see that S2, using the features of E 5 and MTL for classification, achieves an average accuracy of 76.42%, which is higher than the 69.65% accuracy obtained by S1.This demonstrates that leveraging the knowledge acquired from the segmentation task improves the classification accuracy and confirms the effectiveness of the proposed MTL approach.
In our second analysis, we examined the performance of different combinations of encoder layers E 4 , E 5 , and the decoder layer D 4 from settings S2 to S7. Settings S2 to S4 use single-level features extracted from E 4 , E 5 , and D 4 , respectively.We observe that using the features from E 5 and D 4 alone for PCBMTL results in accuracies of 76.42% and 74.57%, respectively, which are much higher than using those of E 4 with an accuracy of 51.40%.This suggests that deeper layers, such as E 5 or D 4 , Table 2. Ablation study of the proposed PCBMTL on the BDIP validation set.(Here, and in following tables, the best result is bolded while the second best is underlined).contain higher semantic features, allowing for better knowledge acquisition.We also studied the utilization of multilevel features in settings S5 to S7.We found that S6 (E 5 þ D 4 ) and S7 (E 4 þ E 5 þ D 4 ) achieve much higher accuracies of 82.55% and 82.93% compared to S5 (E 4 þ E 5 ) with an accuracy of 77.72%.This again confirms that E 5 and D 4 contain more useful high-level features for soldering defect classification.In addition, for S5-S7, features at different levels complement each other providing higher accuracies compared to those using settings S2-S4.Third, we also conducted experiments with the inclusion of the soldering segmentation mask feature M 3 from S8 to S12.Using M 3 in S8 for PCBMTL results in even lower accuracy than S1 which does not use MTL.This suggests that solely using the soldering segmentation mask does not help with soldering defect classification.However, when used in combination with other features, M 3 can be useful.S10 ( ) with the addition of M 3 .S10 obtained an accuracy of 82.81% which is higher than the accuracy of 77.72% by S5.Yet, the performance of S10 is on par with the ones in S6 and S7.In contrast, S9, with only E 5 þ M 3 , obtained an 85.10% accuracy, which is higher than that of S10.With the use of the features of obtain average validation accuracies of 86.29% and 84.63% respectively, which are among the highest among all configuration settings.Hence, S9, S11, and S12, achieving the top-3 highest accuracies, were selected for further experiments based on their high accuracy in the ablation study.They are named, PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F, respectively, for the rest of this article.

Comparisons with State-of-The-Art Approaches
Table 3-5 summarize the comparisons of the proposed PCBMTL with state-of-the-art PCB classification approaches and ResNet-50 by evaluating the corresponding test accuracy, , where is the test set, with p ∈ P ¼ f10, 12, 14, 16, 18, 20g. [9,10,12,13,31]We also include a classification and segmentation MTL approach for comparison although it was originally used for biomedical images. [42]imilar to (13), the average accuracy on the test set Avg , is also provided in Table 3-5.
On the BDIP dataset, as shown in Table 3, PCBMTL-2 F, PCBMTL-3 F, and PCBMTL-4 F achieve the highest average accuracies of 85.31%, 85.92%, and 85.14%, respectively, which outperform state-of-the-art approaches of accuracies ranging from 30.64% to 72.37% by large margins.At p ¼ 10%, the proposed PCBMTL achieves accuracies of 75.19% to 78.59%, which already Table 3. Comparisons of the proposed PCBMTL with state-of-the-art approaches on the BDIP test set.Regarding FFC, as shown in Table 5, PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F achieve the highest average accuracies of 86.46%, 86.04%, and 84.28%, respectively, which outperforms other approaches ranging from 31.73% to 68.89%, with significant margins.Similar to the trend in FDIP, at p ¼ 10%, the proposed PCBMTL obtains accuracies of 81.53% to 83.91%, respectively, which already outperforms with accuracies ranging from 35.82% to 74.90% at p ¼ 20%: The large improvement demonstrates the superiority of the proposed PCBMTL model.
In addition, the average sensitivity or equivalently average recall rate, R Avg , of each method was also measured.On BDIP, as shown in Table 3, our PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F achieve the highest average recall rates of 84.98%, 85.71%, and 84.92%, respectively, which outperform state-of-the-art approaches of average recall rates ranging from 30.25% to 72.28% by large margins.Similar trends are observed for the FDIP and FFC datasets.On FDIP, as shown in Table 4, our PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F achieve the highest average recall rates of 97.20%, 97.36%, and 97.46%, respectively, which outperform state-of-the-art approaches of average recall rates ranging from 52.31% to 91.92%.On FFC, as shown in Table 5, our PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F achieve the highest average recall rates of 86.23%, 85.79%, and 83.93%, respectively, which outperform state-of-the-art approaches of average recall rates ranging from 31.85% to 68.93%.As our dataset is a balanced dataset, the average recall rate is close to the accuracy.In cases where the dataset is imbalanced, the recall rate is often more informative and relevant than accuracy.
In summary, PCBMTL-2F, PCBMTL-3F, and PCBMTL-4F consistently outperform other approaches on all three datasets for all percentages of training samples.When the number of training samples is only 10% of the original dataset, the proposed PCBMTL can still achieve 80% or higher accuracy, which significantly outperforms the conventional approaches.Overall, PCBMTL is a data-efficient PCB soldering defect detection model that is particularly useful under low data regimes when it is challenging or expensive to collect the required training data.

Visualizations of Soldering Region Segmentation
Even though the segmentation task is not necessary during inference, we include a visualization of the segmentation results on the test set for a better understanding of what the PCBMTL-3F network has learned.The visualizations, as shown in Figure 4, are generated using the models trained with 10%, 16%, and 20% of the training data.
In the case of BDIP, we can observe from Figure 4a that for the normal class, the proposed PCBMTL-3 F can achieve a good segmentation of the soldering region even when the training sample percentage is 10%.As we increase the training sample percentage to 20%, the segmentation becomes much closer to the ground truth.In the case of FDIP, as shown in Figure 4f, the proposed model successfully segments the solder without including the DIP pins.Even though a rough ground-truth segmentation mask is manually annotated in Figure 4h, the proposed model can still accurately segment the solder without misjudging the pins as solder.Similarly, in FFC, for example, in Figure 4j, the proposed model accurately differentiates between silkscreen and soldering areas.
The visualization of segmentation results provides insights into the model's learning and helps identify areas where the model performs well or poorly.By leveraging the segmentation results, we can be confident that the proposed model accurately classifies the soldering defects.

Further Enhancement with Pretraining
ImageNet pretraining is a popular technique for initializing the weights of a model using the massive ImageNet dataset, which consists of approximately 1.2 million training images. [33]he pretrained model is subsequently fine-tuned for a specific downstream task.It can be a way to help in the low-data regime since the task-related training data can be reduced.In this section, we investigate the effectiveness of our proposed MTL approach for ImageNet pretrained models on our PCBSPDefect dataset.We first pretrained the encoder and the classification branch of the proposed PCBMIL on ImageNet using (11).Then, during the fine-tuning phase, the entire model was further trained on our BDIP dataset using MTL via (12).The results are presented from Table 6-8.
On BDIP (Table 6), our model pretrained on ImageNet without MTL (referred to as PT) achieves an average accuracy of 82.87%.
However, when we utilize both pretraining and MTL, our models, PT þ PCBMTL-2F, PT þ PCBMTL-3F, and PT þ PCBMTL-4F, achieve significantly higher average accuracies of 86.90%, 87.02%, and 86.90%, respectively, with a margin of about 4%.This suggests that although ImageNet pretraining can help the model learn rich image representations, our proposed MTL approach can further improve the classification performance.
On FDIP (Table 7), our model pretrained on ImageNet without MTL (referred to as PT) already achieves an average accuracy of 97.96% since this is a relatively easier task.Yet, similar to the BDIP case, when we utilize both pretraining and MTL, our models, PT þ PCBMTL-2F, PT þ PCBMTL-3F, and PT þ PCBMTL-4F, achieve significantly higher average accuracies of 99.26%,     99.31%, and 99.09%, respectively, with a margin of over 1%.This again shows that our proposed MTL approach can further improve the classification performance even when pretraining is utilized.
All these results are way better than the traditional PCB soldering defect detection methods, as shown in Table 3-5.

Conclusion
In PCB assembly, identifying soldering defects is crucial for enhancing manufacturing reliability.AOI using deep learning is a promising solution for this task.Yet, due to many practical reasons, there is a lack of training samples, which poses a challenge for training deep learning PCB soldering defect detection models under low-data regimes.To address this, we have proposed a novel MTL deep learning model PCBMTL in this article, which simultaneously learns the segmentation and classification tasks.With the learned segmentation knowledge, the classification performance is improved even when only a small amount of training samples are available.To facilitate MTL, we also built the PCBSPDefect dataset, which covers three components: BDIP, FDIP, and FFC, with corresponding segmentation masks.Experimental results show that the proposed PCBMTL achieves the highest accuracies of 85.92%, 97.44%, and 86.46% when testing on the BDIP, FDIP, and FFC test sets, respectively, outperforming the best prior arts by over 13%, 5%, and 17%, respectively.Besides, further improvements are noted when the models are pretrained on ImageNet.To the best of our knowledge, this is the first work to provide a classification and segmentation soldering defect dataset, as well as apply MTL to PCB soldering AOI.We believe that the proposed framework can improve the performance of soldering defect inspection while reducing the need for large datasets.

BDIP ( 5 FDIP ( 3 Figure 1 .
Figure 1.Image samples and the corresponding binary segmentation masks in the proposed PCBSPDefect dataset for soldering defect detection: a) DIP at PCB back side (BDIP), b) DIP at PCB front side (FDIP), and c) flat flexible cables (FFC).

CameraFigure 2 .
Figure 2. a) Image acquisition system and b) BDIP, FDIP, and FFC image samples captured under different lighting levels and their corresponding segmentation masks.

Figure 3 .
Figure 3. Overview of the proposed MTL framework, PCBMTL.The convolutional layer, fully connected layer, downsampling, upsampling, and global average pooling are denoted as Conv, FC, Down, Up, and GAP, respectively.

Table 1 .
Summary of the BDIP, FDIP, and FFC datasets.(Numbers within brackets are the number of PCBs used).

Table 4 .
Comparisons of the proposed PCBMTL with state-of-the-art approaches on the FDIP test set.

Table 5 .
Comparisons of the proposed PCBMTL with state-of-the-art approaches on the FFC test set.

Table 6 .
Effectiveness of the proposed PCBMTL approach for ImageNet pretrained (PT) models on the BDIP test set.

Table 7 .
Effectiveness of our proposed PCBMTL approach for ImageNet pretrained (PT) models on the FDIP test set.

Table 8 .
Effectiveness of our proposed PCBMTL approach for ImageNet pretrained (PT) models on the FFC test set.