Research on image classification of coal and gangue based on a lightweight convolution neural network

It is vital to differentiate between coal and gangue in coal mining effectively. In recent years, classification methods for images of coal‐ and gangue‐based convolutional neural networks have emerged, which makes the analysis process simple and fast. However, these methods still have the limitation of balancing recognition performance and computational efficiency due to the limitation of device hardware. This research proposes a group convolution and channel shuffle augmentation Ghost network (GSAGNet) for the classification task of coal and gangue images. First, we design a new Ghost module with group convolution and channel shuffle to improve the stability of the Ghost module for better extraction of features of coal and gangue. Second, we redesign GhostNet to improve the model's classification accuracy while guaranteeing similar floating‐point operations. Finally, an efficient channel attention module is embedded in the prediction output of the model to improve the model's prediction results further. Extensive experiments on actual image datasets of coal and gangue show the feasibility and superiority of our proposed GSAG module. The GSAG module is applied to models such as AlexNet, VGG13, and ResNet50, and the accuracy of the models is improved by 0.27%, 1.24%, and 1.25%, respectively, compared to the Ghost module. The classification accuracy and F1‐score of the proposed GSAGNet model reach 97.50% and 97.50%, respectively, which are 1.01% and 1.04% higher than that of GhostNet. The GSAGNet model in this study can classify coal and gangue efficiently, which can effectively enable the separation of underground coal and gangue and has specific practical application significance and value.


| INTRODUCTION
With the implementation of various technical measures to promote the green and sustainable development of the coal industry, energy from coal will still play an essential role in China's future energy system, despite the decline in the share of coal consumption. 1 However, gangue, as an associated material in coal mining, seriously affects raw coal production. 2 This is because the mineral composition of gangue mainly includes SiO 2 and Al O 2 3 , which also contain various heavy metal elements. 3 Thus, in the process of calcining gangue, complex transformations occur between minerals and trace elements, resulting in the discharge of many heavy metal pollutants, exerting adverse effects on human health and the environment. 4,5 Therefore, accurate separation of coal and gangue is crucial for effectively utilizing coal energy. At the same time, it is especially critical to carry out precise classification before the separation of coal and gangue.
The traditional coal and gangue separation methods are divided into artificial coal separation and wet and dry automatic coal separation. 6 Artificial coal selection primarily relies on workers' experience, which is inefficient and can be hazardous to workers' health. 7 The machine wet sorting method is accurate in terms of the density difference between coal and gangue but at the cost of causing severe pollution in the environment and wastage of water resources. 8 Among these methods, the dry coal separation method includes the ray and machine vision separation methods. [9][10][11] Although it is convenient to use radiation sensors to distinguish between coal and gangue, there is a radiation hazard and the equipment is expensive. At the same time, with the rapid development of image processing and pattern recognition, coal and gangue recognition technology based on image processing has attracted increasingly more attention in the coal mining field. 12 Based on this, Liang et al. extracted critical features from a grayscale histogram for differentiation between coal and gangue based on the grayscale difference between the images of coal and gangue. 13 Yu et al. proposed a method to extract features using extended coexisting partial grayscale compression for the grayscale information of images of coal and gangue to classify both. 14 Hobson et al. analyzed the textural characteristics of coal and gangue images and distinguished the two materials based on their textural differences. 15 Li et al. combined these two features and used the image's spatial information to improve coal and gangue's classification accuracy. 16 In fact, the above classification studies depend mainly on the design of manual features. These methods not only rely too much on high-quality images of coal and gangue but are also susceptible to the effects of illumination, which leads to inconsistent classification results. 17 In recent years, with the continuous upgrading of computing systems, some complex architectures based on deep convolutional neural networks (CNNs) have shown promising results in coal and gangue classification tasks. For example, Hong et al. proposed a method based on AlexNet architecture and transfer learning to identify coal and gangue automatically. 18 [24][25][26] A general trend is development of more profound and complex models with higher classification accuracy. However, due to the limited storage space and processor performance of mobile or embedded devices, significant computational losses prevent these devices from further using complex networks. Exploring models that can achieve the best accuracy with a minimal computational budget is necessary to satisfy the application requirements. Meanwhile, development of small and efficient CNNs provides a valuable direction, such as MobileNet, ShuffleNet, GhostNet, and EfficientNet. [27][28][29][30] In addition, ECANet provides a lightweight architectural unit that can be embedded into current models to improve model performance at a small computational cost. 31 In this study, we propose a model combining an ECA module and an efficient, lightweight CNN for image classification of coal and gangue. Specifically, we redesign the backbone of the model based on GhostNet. The feature extraction and information interaction capabilities of the Ghost module are improved by using group convolution and channel shuffle, respectively. The ECA module can effectively capture cross-channel interactions with a slight increase in computational parameters, allowing for more effective target characterization. Therefore, our proposed GSAGNet encodes more information and achieves better classification performance with limited computational resources. A series of comparative experiments for the classification of coal and gangue show that the model designed in this study has specific feasibility and superiority. Its classification accuracy is better than other corresponding structures. This paper is formatted as follows: Section 2 describes the data sets used for training and testing and outlines the model framework in this paper. In Section 3, we present the results of experiments conducted on real data and compare them to other models to demonstrate the superiority of the model proposed in this paper. Finally, in Section 4, we summarize the paper's innovative points and discuss some possible future directions.

| MATERIALS AND METHODS
The image classification method of coal and gangue proposed in this study consists of three steps. First, raw images of coal and gangue samples are collected, and the data set is divided. Then, the acquired image data set is enhanced and preprocessed. Finally, the coal and gangue are classified using the improved GhostNet model.

| Materials and image sampling
In this study, coal and gangue samples were obtained mainly from a mining area in East China. We used the visible light mode of the FLIR E50 thermal imaging camera from Fili USA for data acquisition. We randomly selected the number of coal and gangue every 200 pieces for sampling, yielding a total of 400 pictures of coal and gangue. The experimental data set of this study is mainly divided into a training set and a testing set; the former contains 320 images, accounting for 80%, and the latter includes 80 images, accounting for 20%. There is no overlap between the training and testing sets.

| Data enhancement and preprocessing
Considering the insufficient number of our homemade images, to avoid the problem of model overfitting while improving the generalization ability of the classification model, we applied various image enhancement methods to the images of the data set using Python software and its associated image processing functions. Specific image enhancement methods include multiangle rotation, mirroring, and random brightness. We add Gaussian and Salt noise to the images, respectively, considering the noise generated by the image acquisition device during image acquisition. In addition, we apply motion blur to the images to simulate the blurring of the acquired images due to equipment shaking, taking the pictures of coal in the training set as an example. The detailed distribution of the data is shown in Figure 1.
Meanwhile, the specific division of the gangue image data set before and after enhancement is shown in Table 1.

| Group convolution and channel shuffle augmentation Ghost module
Based on the finding of redundancy in the middle feature maps of CNNs, Han et al. proposed a Ghost module to replace the traditional convolution, which significantly reduces the number of parameters of the model while ensuring the model's performance. 29 The core idea is to divide the standard convolution into two parts. First, the input features are intrinsically concentrated using several filters. Next, cheap linear mapping is performed to generate Ghost feature maps based on the intrinsic feature maps obtained in the previous step. Its specific process is shown in Figure 2A.
In Figure 2, c, H, and W denote the number of channels, height, and width of the input feature maps, respectively; n, H′, and W ′ denote the number of channels, height, and width of the output feature maps, respectively. k represents the size of the convolution kernel, n′ represents the number of channels of the intrinsic feature maps, n n − ′ represents the number of channels of the ghost feature maps, and g represents the number of groups of group convolution.
Although the computational complexity of the model can be greatly reduced by using the Ghost module, some of the convolution operations in this module can only extract a small number of key features, which can only sometimes guarantee high accuracy and lead to poor stability. 32 However, although simply increasing the standard convolutional kernel input can improve the results to some extent, too many redundant parameters can lead to overfitting when convolutional kernels are too large. We introduce group convolution instead of the original convolution in the Ghost module to avoid the above problems. 33,34 Group convolution can increase the diagonal correlation between adjacent layer filters and reduce the training parameters, making the training less prone to overfitting, and the effect is similar to regularization. In addition, group convolution can reduce the FLOPs required by the Ghost module, further enabling the model's light weight.
Meanwhile, in the specific implementation of the Ghost module, the cheap linear operation is generally equivalent to depthwise convolution by default. However, the depthwise convolution only blends the spatial dimensions, and the information between individual channels is not interacted with, resulting in the loss of some information between channels in the subsequent flow of information. This will affect the representational ability and recognition accuracy of the model. To solve this problem, we introduce a channel shuffle operation in the Ghost module to enable information flow. 28 Without increasing the computational cost of the model, the input of the next Ghost module is guaranteed to contain more valid information by disrupting and reorganizing the intrinsic feature map and Ghost feature map. Thus, information can flow between different channels, which helps to encode more information and improve the robustness of the model.
On the basis of the above description, we propose a GSAG module. Figure 2B depicts the implementation process, which consists of four basic components. The intrinsic feature maps are first generated in part using group convolution. The intrinsic feature maps are then subjected to a linear modification to create the Ghost feature maps. The intrinsic feature maps and the Ghost feature maps are combined. The channels of the stitched feature maps are shuffled in the end.
Since the GSAG module operates group convolution on the input feature map, the computational complexity is lower than that of the Ghost module. The computational complexity equations for the Ghost and GSAG modules are Equations (1) and (2), respectively.
where the Arabic number 2 is added because a MAC counts two operations, MAC stands for multiplication and addition, c is the number of input feature map channels, H′ and W ′ are the height and width of the output feature map, respectively, and k is the size of the convolution kernel, n′ is the number of channels of the intrinsic feature map, n is the number of channels of the output feature map, and d and k have similar values.
F I G U R E 1 Distribution of coal in the training set.
T A B L E 1 Coal and gangue image data set before and after data enhancement.
Therefore, according to Equations (2) and (3), the theoretical complexity ratio of the GSAG module to the original Ghost module is denoted as where ∈ g N* and ≫ c g, ∈ n n n R ( − ′)/ ′ + , and ≫ c n n n ( − ′)/ .

| The establishment of the coal and gangue classification model based on the GSAG module
Using GSAG modules, we create a novel bottleneck structure inspired by GhostNet. As can be seen in Figure 3A, the bottleneck structure comprises two stacked GSAG modules when the step size equals 1. In the diagram, the SE stands for the SE attention mechanism. 35 The connection is bypassed when SE is not utilized, as indicated by the dashed box. Batch normalization is referred to as BN. Relu represents the nonlinear activation function. When the step size is equal to 2, as shown in Figure 3B, a depthwise convolution of step size two is inserted between the two GSAG modules and in the shortcut path. In addition, the ECA attention mechanism is an efficient channel attention module proposed based on the SE module. The structural differences between the two are shown in Figure 4. The full-connected dimensionality reduction in the SE module can lower the network model's complexity. However, it also eliminates the direct correspondence between weights and channels, which impacts correlation prediction. The ECA attention module allows for local crosschannel interaction without dimensionality reduction and has strong cross-channel information collection capabilities. Additionally, although it includes minimal arguments, this module may significantly enhance performance. A lightweight attention mechanism ECA module is eventually selected to lighten the model further.
In Figure 4, GAP stands for global averaging pooling, FC stands for fully connected layer, k stands for coverage of cross-channel interactions, σ stands for the sigmoid activation function, and  stands for an element-wise product.
The image classification model of coal and gangue proposed in this study mainly consists of the GSAG_Bottleneck sequences and the ECA attention mechanism module. Its detailed network structure is shown in Table 2. A conventional convolutional layer is the first layer and extracts the fundamental features. A string of GSAG_bottleneck blocks comes next. According to the size of their input feature maps, these bottleneck blocks are divided into various stages, and only the final GSAG bottleneck in each stage has a step size of 2 for downsampling. After a series of GSAG_bottleneck blocks, an effective ECA module is fused at the model prediction output. It is in charge of improving the connection between the output high-dimensional channel information. It enhances the performance of the coal and gangue classification model by enabling further extraction of important feature information at a low computational cost.
In Table 2, Input represents the size of the input feature map, #exp represents the number of channels in the layer GSAG_bottleneck that the first GSAG module has processed, and #out represents the number of channels output from that layer. SE represents the addition of the SE layer to the GSAG Bottleneck.

| Experimental conditions
The experiments are all performed in the Windows 11 64bit operating system environment, utilizing the deep learning open-source framework Pytorch version 1.9.0 with CUDA 11.1.74 for training. The Computer is established with a GTX 2060 graphics card, 16 GB of RAM, and an Intel(R) Core(TM) i5-12600KF CPU running at 3.69 GHz. In the studies, all models are changed to have two completely connected layer neurons at the end, to represent coal and gangue. According to this claim, all models are trained using the same training pattern in the following experiments. Multicategorical cross-entropy is selected as the loss function in this model, and Adam is chosen as the optimizer. The learning rate decay approach is adjusted with equal intervals to improve the model's convergence. In Table 3, the hyperparameters used in the experiments are listed.

| Evaluation metrics
In this study, accuracy, precision, recall, and F1-Score are used to evaluate the model's classification performance, and the specific formulas for each metric are defined as follows: where TP represents the sample number of properly categorized coal, FP represents the sample number of correctly classified gangue, TN represents the sample number of erroneously classified coal, and FN represents the sample number of incorrectly classified gangue.

| Performance analysis of the GSAG module
From the introduction of the improvement algorithm above, we mainly use group convolution instead of the standard convolution in the Ghost module. The number of groups of group convolution is a very important parameter affecting both the model's complexity and accuracy. Therefore, we analyze to determine the number of groups. To further verify the superiority of our proposed module, a series of comparative experiments are conducted to compare the performance of the model with that of the Ghost module.
Hyperparameters of the classification model.

Hyperparameters
The numerical Learning rate 0.001 Step size 5

| Performance comparison with the Ghost module
We conduct further comparison experiments between the identified GSAG module and the Ghost module using the value of g determined in the above-mentioned experiments. The benchmark model is made lighter by substituting the convolutional layer of the AlexNet, Vgg13, and ResNet50 with the Ghost module and the GSAG module, respectively. Table 5 shows the comparison of results. It can be seen that both the Ghost module and the GSAG module significantly reduce the FLOPs of the base model. Meanwhile, for AlexNet and Vgg13 models, the accuracy is still improved after embedding the Ghost module and the GSAG module. Notably, GSAG_AlexNet and GSAG_Vgg13 reduce the number of FLOPs by 31.97% and 47.43%, respectively, and improve the accuracy by 0.27% and 1.24% compared to Ghost_Alex-Net and Ghost_Vgg13. Compared with Ghost_ResNet50, GSAG_ResNet50 still achieved higher accuracy and had lower FLOPs.
From the above experimental results, we may infer that our proposed GSAG module can further reduce the complexity of the model compared with the Ghost module while maintaining a higher accuracy of the model. This indicates that the GSAG module has some feasibility.

| Ablation experiments for channel shuffle and the ECA module
The accuracy of the coal and gangue classification model is improved to some extent by both the Channel Shuffle and the ECA module shown above. As a result, this study analyzes how well these two components work. According to the addition situation, we designed the following four experiments, that is, (a) without adding any operation, (b) adding only Channel Shuffle, (c) adding only the ECA module, and (d) adding both Channel Shuffle and the ECA module. The experimental results are shown in Figure 5.
Adding channel shuffle alone increases the accuracy, precision, recall, and F1-Score of the model by 0.34%, 0.33%, 0.34%, and 0.33%, respectively, according to experiments (a) and (b) shown in Figure 5. The channel shuffling also does not incur any further computation costs. According to experiments (a) and (c) in Figure 5, adding the ECA module alone increases the model's accuracy, precision, recall, and F1-Score by 0.57%, 0.55%, 0.57%, and 0.57%, respectively. The effectiveness and precision of the network's information extraction may be increased by using a single ECA module, which can significantly increase the model's classification accuracy. Additionally, the complexity of the overall classification model for coal and gangue makes the additional computing work delivered by a single ECA module insignificant.
The areas with higher classification weights, which reflect the sensitive regions of the input pictures, are investigated to more clearly see the weight distribution of GSAGNet incorporated into the ECA attention module in the classification job of coal and gangue images. In this experiment, gradient-weighted class activation mapping (Grad-CAM) is introduced for model visualization. 36 T A B L E 4 Influence of different g values.

Models
Accuracy ( In the Grad-CAM map, red represents the high classification weight region and blue represents the low classification weight region. The results of the visualization are shown in Figure 6. Each input picture has a true label at the top, and P stands for the softmax score of each model for the true value class. In the overall trend, GSAGNet with embedded ECA module has different classification weight distributions in the images of coal and gangue, that is, the ECA module affects the feature extraction process of the GSAGNet model. Specifically, the high classification weight regions covering the coal and gangue surfaces in the photos become substantially larger once the ECA module is embedded into the model. In addition, the red areas are more concentrated, and the color depth gradually becomes darker, indicating that the classification weights become larger. We can see that the performance of the network model is improved by adding the ECA module, showing improved interpretability. Among the four tested experiments, (d) has the best classification accuracy. The results suggest that both strategies affect the classification accuracy of GSAGNet. Additionally, the model's performance with the two revised methods beats the model's performance with only one of the strategies. This experiment further demonstrates the feasibility and effectiveness of GSAGNet, a coal gangue classification network. F I G U R E 5 Classification metrics of coal and gangue for the four models.

| Comparison with mainstream lightweight networks
To verify the superiority of our proposed model for image classification of coal and gangue, several currently popular lightweight CNNs are selected for comparison in this study. To ensure the fairness of the experiment, we unified the FLOPs of all the models at the same level, where the decimal after the network is the controllable width factor. We unify the FLOPs of all the models at the same level to ensure that the experiments are fair. The results are shown in Figure 7, where the decimal after the network is the controllable width factor. Since there are only two classification categories, the classification accuracy of all methods is relatively high, above 95%. For image classification of coal and gangue, the GSAGNet proposed in this study achieves the best classification results on the testing set with an accuracy of 97.50%. Compared with the models MobielnetV2, ShufflenetV2, Ghostnet, and Efficientnet, the accuracy is improved by 1.78%, 0.79%, 1.01%, and 0.95%, respectively. In addition, the results of other metrics are shown in Table 6.
As observed, GSAGNet outperforms other models with comparable levels of model complexity in terms of classification accuracy for coal and gangue. To be more precise, the GSAGNet's accuracy, recall, and F1-Score are 97.63%, 97.5%, and 97.5%, respectively. Among them, the classical lightweight model MobileNetV2 has the lowest classification accuracy. Compared with MobileNetV2, the precision, recall, and F1-Score of GSAGNet improved by 1.41%, 1.78%, and 1.8%, respectively. Compared to the F1-Scores of ShuffleNetV2, GhostNet, and Efficient-NetB0, GSAGNet improves by 0.79%, 1.04%, and 0.96%, respectively. In general, the strategy proposed in this study is more competitive when applied to coal and gangue image data sets with the same level of model complexity guaranteed. We propose a lightweight convolutional neural network used for coal and gangue image classification in this paper. First, we combine group convolution and channel shuffle to enhance the original Ghost module to obtain the GSAG module. Then, a new bottleneck architecture is obtained by stacking GSAG modules to extract image features. Finally, a new GSAGNet classification model for coal and gangue is developed using this bottleneck architecture and ECA module. With the support of actual image data sets of coal and gangue, we have validated the proposed strategy. Accuracy, precision, recall, and F1-Score are the model's classification performance metrics. The experimental results prove that the proposed model is feasible and best in classification performance. On the data sets of coal and gangue, the accuracy, precision, recall, and F1 score of GSAGNet improved by 1.01%, 0.84%, 1.01%, and 1.04%, respectively, relative to GhostNet. In addition, information can be obtained from the results as follows: 1. In the Ghost module, using group convolution rather than conventional convolution decreases module complexity and eliminates the issue of redundant model parameters. This study reduces the complexity of the training model while guaranteeing classification accuracy to some extent by setting the crucial parameter g in the group convolution to 2. 2. The addition of channel shuffle can help the Ghost module improve the characterization of important information. This is because channel shuffle uses disruption and reorganization to allow better information flow and encode more information, thus improving the robustness of the model. 3. The embedding of the lightweight ECA module can help GSAGNet focus on the target of coal and gangue in images during the training process. At the same time, it can improve the efficiency and accuracy of coal and gangue classification by adjusting a small number of calculation parameters to filter out more irrelevant information.
4. The algorithm in this paper has an advantage in terms of coal and gangue classification performance over several lightweight networks with similar FLOPs. The method can identify the type of coal and gangue accurately and efficiently. It provides some theoretical guidance for the underground realization of coal and gangue separation and has a certain promotion value.
Despite the results achieved in this study, we are also aware of the potential impact of metrics such as memory access time, inference speed, and network parallelism in assessing computational efficiency. In future research, we will further explore these metrics with the aim of fast localization and detection of coal and gangue to obtain a comprehensive and accurate method for evaluating computational efficiency. Our ultimate goal is to deploy the algorithm to mobile, embedded devices for practical applications.
AUTHOR CONTRIBUTIONS Zhenguan Cao conceived the study and supervised the study. Zhenguan Cao and Liao Fang developed the method and wrote the manuscript. Liao Fang, Rui Li, and Xun Yang implemented the algorithms. Liao Fang, Jinbiao Li, and Zhuoqin Li analyzed the data. All authors read and approved the final manuscript and content of the work.