An improved picture-based prediction method of PM2.5 concentration

PM2.5 can bring serious harm to people’s health and life because it easily causes cardio-vascular disease and increases the risk of cancer. Hence, monitoring PM2.5 real-timely becomes a key problem in environmental protection. Towards this end, this paper proposes an improved picture-based prediction method of PM2.5 concentration using artiﬁcial neural network (ANN). Firstly, the weather image is transformed into Hue, Saturation, Value (HSV) color space to extract its saturation map, then the corresponding spatial and transform-based entropy features of image space are extracted. Secondly, the PM2.5 concentration model is built based on the two extracted features from the weather image using Artiﬁcial Neural Network (ANN) theory. Thirdly, an ANN model is trained using the pre-processed data. The training parameters and conditions are also explored through multiple experiments to achieve the best model accuracy. Experimental results show that the model has the best prediction effect when comparing to other state-of-the-art models.


INTRODUCTION
Pollution has become a global issue [1], where PM2.5 is considered as one of the principal factors [2]. PM2.5, which is an invisible particle with aerodynamic diameter that is less than 2.5 m, has seriously threatened people's health [3], and brings incalculable hidden dangers [4] and inconveniences to people's lives [5]- [7]. Once breathed into lung, it cannot be taken out; as a result, human bronchi and lungs will be seriously affected [8]- [10], especially for the young people [11]. Moreover, too much PM2.5 can also cause air pollution and reduce visibility, leading to traffic jams and increasing travel risks.
At present, there are some kinds of PM2.5 measurement methods. The first is traditional physical or chemical measurement. Two commonly used methods in this category are called weight method and micro oscillating balance method. In [12], a surface acoustic wave (SAW) sensor based on delay line structure is used to measure both PM10 and PM2.5. By taking advantage of the weight sensitivity and is operated at 125 MHz, this This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology sensor can measure PM10 and PM2.5 from candle smoke in the concentration range of 0 − 500 g.m −3 . In [13], a laser PM2.5 detector based on light scattering theory and single chip microcomputer is designed. By modeling and calculating the mapping relationship between the measurement data and the standard monitoring data of the detector, the accuracy of the detector is effectively improved. From an overview, the traditional methods are still widely used; however, they all have the disadvantage of high cost, because human labor is a must to install and maintain the instruments. The other kind of methods are using some artificial methods [14]- [17].They extract PM2.5 information based on some image processing technologies [18]- [22]. Mostly, in order to extract color features of an image, RGB information need to be transformed to HSV space, then PM2.5 concentration can be obtained by linear mapping. In recent years, with the advent of big data, images with real-time information are easier to be obtained and stored than ever. As a result, image-based methods are very useful for identification, prediction, observation and analysis in many fields, especially in some special application domains, such as root density recognition [23]- [24], river turbidity measurement [25], smoke detection [26] and economics. Ke et al. estimated PM2.5 concentration from image without manual annotation for the first time [15]. In Ke's paper, the entropy feature of meteorological photos without PM2.5 is extracted and the natural statistics (NS) model is established by using the feature that PM2.5 concentration has a great influence on image saturation, then the relative concentration of PM2.5 is mapped by using the non-linear mapping of likelihood measure, and two kinds of features are extracted from the same image, which are gradient similarity and distribution shape of pixel value in saturation image. Then, the preliminary estimation of PM2.5 concentration is obtained, and the non-linear function is used to map the initial concentration to the actual PM2.5 concentration. Zhan et al. collected 70 samples of PM2.5 from dusty environment to analyze THz radiation [27]. In his research, partial least square method and artificial neural network are used in modeling, at last, of PM2.5 THz spectrum and pollutants levels are analyzed using statistical methods. Xu et al. pays special attention to the dynamic relationship between multiple stations. He firstly proposes an ANN to simulate the transport process of air pollutants, and then he constructs a long-term memory (LSTM) self-coding multi task learning model for PM2.5 time series prediction in multi locations [28].
It has been proved that when compared with pictures taken in good weather, pictures taken in environments with serious pollution have higher order in hue and saturation [15]. This indicates that PM2.5 concentration is inversely related to saturation. When PM2.5 concentration rises to a certain level, the saturation is very low, and vice versa. Base on this non-linear relationship, we propose to train a neural network to predict the PM2.5 concentration based on two features extracted of the weather image. We firstly pre-process the collected images using feature extraction, then we propose to train and combine an ANN model to map the image features, and to obtain the estimated PM2.5 concentration model. The contribution of this paper is a new image-based PM2.5 concentration estimate framework is designed and a great non-linear model in the framework is gotten by neural network learning from data.
The rest of this article is as follows: Section 2 introduces the methodology and implementation of the prediction model. Section 3 compares the prediction results with other methods, which shows improvements of using the model. In addition, the significance of model parameters and environmental conditions are tested, in order to find out the best solution and to describe model's improvements. At last, Section 4 summarizes the whole paper.

Feature extraction
Among the extracted key features, RGB color space and HSV color space are usually transform each other [29]. RGB space describes colors in terms of the amount of red, green, and blue present. HSV space describes colors in terms of the Hue, Saturation, and Value, but when the color description was expected to play an integral role, the HSV color model is often preferred over the RGB model [30]- [31]. The HSV model describes colors similarly to how the human eye tends to perceive color. RGB defines color using a combination of primary colors, and HSV describes color in terms of more familiar comparisons such as color, vibrancy and brightness. The saturation, contrast, hue and other variables in RGB space of the picture are all affected by PM2.5 concentration. This paper uses HSV color space to extract the feature. Among the characteristics, saturation and brightness are more sensitive to the change of PM2.5, so they are more obvious and effective in reflecting PM2.5 concentration. Firstly, the saturation map S (a, b) was extracted in HSV color space by equations (1)-(3).
where a and b are the pixel index in horizontal and vertical directions separately. Next, the common index of image quality evaluation, namely image information entropy, is calculated in the saturation map. The larger the entropy of image information, the richer the information and the better the quality. Reference [15] proposed that the image information entropy in the saturation map has a strong relationship with the PM2.5 concentration of the weather image. The first feature mage information entropy F1 is calculated by equation (4).
where h and w are the height and width of saturation map independently. The P[S (m, n)] is the probability distribution of S (m, n). We get the P[S (m, n)] using the method proposed in [15]. Then, we extract the second feature in the wavelet domain transformed from saturation map. A scale-space orientation decomposition method and Haar wavelet transform are applied. The saturation map is decomposed into five scales. In each scale, two subbands are used. One is the LH+HL subband S 1 and the other is HH subband S 2. The second feature overall entropy F2 of the weather image is extracted by Equations (5)- (7).
where S 1 k is LH + HL subband at the kth scale, S 2 k is HH subband at the kth scale. h k s1 , w k s1 are the height and width of the LH + HL subband at the kth scale. h k s2 , w k s2 are the height and width of the HH subband at the kth scale. P[S 1 k (m, n)], P[S 2 k (m, n)] are the probability distribution of coefficients in the LH + HL, HH subbands at the kth scale.

Model construction
Artificial neural network has become well-known non-linear tool and is widely adopted in a variety of areas in recent years [32]. It also be usually used in image processing [33], [34]. A multi layered perceptron(MLP) ANN is used in this paper. The MLP network has one input layer, one or more hidden layers and one output layer [35]. The two features discussed in section A, entropy in saturation mapping space domain and overall entropy after wavelet transform, are designed as the inputs of ANN. The PM2.5 concentration data is used as output of the ANN. Each layer has several neurons in the adjacent layer through the hidden layers. The model flow chart is shown in the Figure 1. The weather image was sent into the feather extracted model, then we will get the entropy in saturation mapping F1 and overall entropy F2. The PM2.5 concentration will be calculated using the ANN model by the equation (8).
where W 1 and W 2 are the weights matrix. b1 and b2 are the bias. f (⋅) is the sigmoid funciton. x is the input vector made of the F 1 and F 2. The W 1 , W 2 , b 1 , b 2 can be the optimal values gotten through the following training process.

ANN training
Back propagation (BP) is one of the most famous and commonly used algorithms in ANN training. By using this algorithm, an ANN continuously adjusting model's weights so that it can output predicted values according to given features.
BP algorithm is supervised learning method. So some samples need to be prepared for training. Every sample includes two features vector of the weather image I i = (F 1 i , F 2 i ) and the PM2.5 concentration of the same weather image P i , where the i is the index of the sample.
A summarized training procedure to get the optimal weights is shown as follows: Step 1: Initialize the weights matrix W 1 , W 2 , b 1 , b 2 randomly.
Step 3: The total error was computed by Equation (9).
Step4: Update the weights and biases.
where is the learning rate; W old is the parameter W before updating; W new is the parameter W after updating; W = {W 1, W 2, b1, b2}.
Step5: If the stop condition is satisfied, then stop training, else go to step 2.

Dataset
A total of 750 images are collected from a wide range of outdoor landscapes, such as parks, roads, lakes, temples and other buildings. Some of them contain objects like squares, flies and cars. The pictures are also shot in different time and locations. The resolutions are different, too, varying from 500 × 261 to 978 × 550. The PM2.5 concentration data from these images are provided by the U.S. Embassy in Beijing, China, and the concentration range is from 1 to 423. For example, the PM2.5 concentration in Figure 2 is 153. It is hard to get the exact value of the PM2.5 concentration by the human eyes. All these characteristics ensure that the dataset is diverse and contains both features and noises. As it has been discussed before, 500 images are randomly chosen for model training, and the rest 250 will be held for testing.

Model training and evaluation
The ANN model with different number neurons in hidden layer has a different non-linear approximate ability. In the model, all hidden layers used the the tangent type sigmoid transfer function, and the output layer used the purelin linear transfer function. All the features of the image and the corresponding PM2.5 concentration were normalized between (0, 1). The neural network learning was based on the NN tool in Matlab; the max epoch was set to 1000; data division was set to random; the minimize error was set to 1.00e-6. There are no established methods for determining the optimum number of hidden layers and neurons for PM2.5 prediction mission, so we decided the structure based on these experiments. Table 1 shows some of the selected trials with the corresponding R and RMSE for various network topologies. Finally, a three-layer ANN model was chosen to predict PM2.5 concentration and there were 15 neurons in the hidden layer in the following experiments. Three common ways can be used to evaluate the model, which are linear correlation coefficient (LCC), Kendall rank correlation coefficient (KRC) and root mean square error (RMSE). In the first metrics, linear relationship between the two variables and the prediction accuracy are measured, in order to find the rules between rankings, and to evaluate the differences between observed and real values. LCC and KRC are positively correlated with predicted intensity, and RMSE is negatively correlated with it.
n is the number of the elements in the vector. In the second metric, KRC is calculated, using the Equation (12).
where M c and M d are the numbers of concordant and discordant pairs in the dataset separately.
In the third metric, root mean square error is calculated using the Equation (13). Root mean square error is the deviation between the observed and the true values. In the equation, X is the average number, n is the number of samples.

Testing results and comparison
After determining the model structure, we trained the model using 500 images data. This process was carried out by utilizing MATLAB 2014b on a laptop with 64-bit operating system, and the training process and results are shown in Figures 3 and 4. Tested the 250 images using the model which has been trained. The results are shown in Figure 5. This

FIGURE 3
Training process

FIGURE 4
The performance of the trained model paper compares seven commonly used PM2.5 prediction models with the performance of ANN model. They are PPPC [15], MNSS [36], ARISMC [37], NIQMC [38], ASIQE [39], APT [21] and BIQME [40]. PPPC model is the first naturalness statistics model to the PM2.5 estimation. ASIQE is a lately devised model based on naturalness statistics. PM2.5 has an immense impact on the camera-captured pictures. NIQMC and BIQME are some models which are designed for the quality evaluation of contrast-altered pictures relevant to the picturebased prediction of PM2.5 concentration. ARISMC devoted to sharpness measurements. Comparison results are shown in Table 2.  28.26% 6 20.96% 5 84. 27 6 In this table, performances of LCC, KRC and RMSE are ranked and compared. From the results, we can see that the proposed ANN model has the best performance in all the three metrics, compared with other models. According to results of LCC, only PPPC and ANN methods are over 55%. However, comparing performances on KRC, ANN has a greater performance than the PPPC method. The author thinks that the ANN model can learn the non-linear map relationship between the features and the PM2.5 concentration by machine learning. So compared with the model with the constant non-linear function, the model we proposed has a great progress in all the metrics.

CONCLUSION
This paper proposes an artificial neural network model in PM2.5 concentration prediction. Firstly, HSV color space and the wavelet decommission technology was used to extract the features. Then, two high related features, namely the entropy features of image space and transform domain, are extracted. Secondly, 500 images with their truth PM2.5 concentration data were used to train the prediction model. Thirdly, the prediction performance is compared with other models by three standards, and experimental results show that the proposed model has greatly improved the prediction accuracy with very few training data. Next step is to further explore the performance on a dataset that contains more information, such as different locations and background noises. Besides, model's performance on different filters (which is related to image features for model mapping) could also need to be studied.