A novel soft sensing method using intelligent modeling method for solar irradiance and temperature in distributed PV power plant

Distributed photovoltaic (PV) power plants often lack solar irradiance monitoring devices, significantly hindering crucial functions such as power forecasting, fault diagnosis, and performance calculation for distributed PV. To address this issue, a real‐time method for soft sensing solar irradiance was proposed in distributed PV. First, we investigated the typical relationship between solar irradiance, ambient temperature, and the electrical characteristics of PV cells. Based on this relationship, we utilized the small sample modeling technique of the Genetic Algorithm‐Support Vector Machine to calculate the ambient temperature. Subsequently, a solar irradiance calculation model based on the backpropagation neural network was developed, taking the PV array voltage, current, calculated ambient temperature, and power as inputs. This approach enables for the estimation of solar irradiance in distributed PV power plants through a simple and efficient calculation process. To demonstrate the reliability and flexibility of the algorithm, we conducted testing with data under various input conditions, such as different power plant configurations, and seasons, coefficient of determination for the proposed model reached 0.95. Overall, the novelty of the proposed method offers a practical solution for soft sensing of solar irradiance in PV power plants, enabling accurate performance analysis and effective operation management without hardware investment.


| INTRODUCTION
In response to the greenhouse effect and the energy crisis, photovoltaic (PV) power has experienced rapid growth as a sustainable and environmentally friendly energy source in recent years.By the end of 2022, global solar PV generation has increased by 240 GW, reaching nearly 1.185 GW. 1 Accurate measurement of solar irradiance in real-time is crucial for PV power calculation, prediction, and performance evaluation of PV plants. 2 PV power exhibits randomness and fluctuations, making precise irradiance calculations and temperature measurements fundamental for short-term power prediction. 3,4Additionally, irradiance and temperature information can be utilized for PV system performance assessment, fault diagnosis, and other purposes.Nemeş et al. 5 analyzed the PV systems performance by examining the correlation coefficient between PV output and global solar irradiance.Chao et al. 6 utilized power generation data, real-time solar irradiance, and module temperature to diagnose faults in PV systems.Due to the numerous and geographically dispersed nature of distributed photovoltaic (DPV) power stations, variations in environmental conditions of PV systems at different locations exist, and accurately measuring the environmental parameters of these DPV power stations poses challenges.Moreover, the accurate measurement of meteorological parameters typically requires the use of sensors, which are sensitive to environmental changes and necessitate regular maintenance and calibration.Solar irradiance, an important meteorological parameter, is usually measured using radiometers, but the availability of solar radiometers is limited due to their high costs and specific maintenance and calibration requirements in PV plants.Consequently, the calculation of irradiance has gained widespread attention and application as a means of indirectly calculation PV power production.
The irradiance calculation models are typically classified into physical models, empirical formulations, and machine learning models.Physical models are based on the theoretical formulation of solar irradiance.Liu and Jordan 7 proposed an isotropic model of the sky and derived a formula for total irradiance on tilted surfaces facing the equator.Ruth and Chant 8 and Datta 9 investigated the relationship between global horizontal solar irradiance (GHI) and the clear index within the isotropic model.1][12] For instance, Zheng and Wu 13 utilized Berlage's formula to calculate diffuse solar irradiance.In contrast, machine learning models exploit the internal correlations between solar irradiance and other meteorological factors such as historical solar irradiance and numerical weather prediction data.These models employ time series prediction and regression techniques for solar irradiance calculation. 14Zhou et al. 15 proposed a multitask learning and Gaussian process regression model for predicting solar irradiance at multiple time scales (daily or monthly).7][18] In a comparative study by Patel et al. 19 and Behrang et al. 20 the results obtained from ANN models were compared with those from conventional models, revealing significant improvements.AlShabi et al. 21introduced a novel estimation method for PV solar cell models, employing a multigroup grey wolf optimizer that exhibits superior performance when compared to other algorithms.Koo et al. 22 developed an ANN model to calculate hourly GHI, utilizing data from meteorological satellite and meteorological imagers.However, the existing calculation methods have limitations in meeting the spatial and temporal resolution requirements of solar irradiance for DPV plants.
The increase in temperature can significantly reduce DPV power output and efficiency.Therefore, accurate modeling of PV temperature has become increasingly important for calculating PV system power output.Numerous studies have explored the temperature of PV modules, including the effects of temperature on module parameters, [23][24][25][26][27] factors influencing module surface temperature, [28][29][30] and methods for calculating module operating temperature.Xiaoyan et al. 31 and Skoplaki and Palyvos 32 investigated the impact of solar spectral distribution and module temperature on the outdoor performance of amorphous silicon and polycrystalline silicon PV modules.Kratochvil et al. 33 discussed the operating temperature of silicon-based solar cells and its effect on electrical performance.PV temperature models often involve empirical and semiempirical models, typically considering solar irradiance on the PV plane, ambient temperature, wind speed, among other factors. 34,35Zouine et al. 36 provided an overview of different methods for PV module temperature prediction by comparing theoretical models and experimental measurements.Dong et al. 37 employed a hybrid modeling approach assisted by a radial basis function neural network to estimate PV system temperature, These studies contribute to the understanding and modeling of PV module temperature, ultimately enhancing the accuracy of PV system power output calculations.
The relationship between the electrical characteristics of the PV system, solar irradiance, and temperature is crucial for understanding the performance of the PV system.The air temperature is primarily affected by geothermal energy and irradiance, with irradiance being the dominant factor in determining temperature variations at a given latitude.Considering the strong correlation between temperature, solar irradiance, and PV output, this paper proposes a real-time soft sensing method for solar irradiance and ambient temperature.The proposed method estimates the ambient temperature using the power, voltage, current, and module temperature of the PV modules, taking the estimated temperature as an intermediate variable to enable soft sensing of irradiance in the PV plant.To validate the effectiveness of the method, it was applied to a DPV plant located in a mountainous region, where obtaining accurate meteorological measurements presents a challenge.The paper consists of the following main research components: Section 2 introduces an overview of the correlation between the electrical characteristics of the PV array, solar irradiance, and temperature, and actual operation statistics are examined and presented.Section 3 describes the calculation model and related algorithms employed in the proposed method.The stability of the calculations is verified in Section 4 through comparisons of results under different input conditions and during various seasons.The findings of this study provide valuable solutions for DPV plants or large ground PV power stations that face difficulties in getting accurate meteorological measurements.

| RELATIONSHIP BETWEEN PV OUTPUT CHARACTERISTICS, SOLAR IRRADIANCE, AND TEMPERATURE
This section is dedicated to gaining a comprehensive understanding of the factors related to solar irradiance measured in PV power generation through theoretical derivation, simulation, and analysis of actual data.

| Theoretical models
The equivalent circuit model for PV cell is illustrated in Figure 1, and it comprises of two ohmic resistors, an ideal diode, a current source I ph and a peripheral resistor R L , that is, forming a circuit. 38y applying Kirchhoff's law to the circuit, we can get the following equations: I ph is the photo-generated current, which is proportional to both the area of the PV cell and the received solar irradiance.Where, (2) The photo-generated current represents the electron-hole pairs generated by the PV effect.The absorbed energy and the quantity of electron-hole pairs increase proportionally with solar irradiance.The photogenerated current is approximately equal to the shortcircuit current I sh , and it exhibits a similar temperature coefficient a I sc .Consequently, the formula for the photo- generated current under arbitrary operating conditions can be presented as follows: The formula of open circuit voltage under arbitrary operating conditions can be expressed as where G is the solar irradiance (W/m 2 ).The subscript ref represents the standard test conditions, and is the operating temperature (°C); a I sc is the temperature coefficient of short- circuit current.β U OC is the temperature coefficient of the open circuit voltage, and δ is the correction factor for solar irradiance.The output characteristics of solar cells under various temperature and irradiance conditions are illustrated in Figure 2. Figure 2A,B demonstrate that the short-circuit current remains constant when the ambient temperature is varied while maintaining a constant solar irradiance, when the ambient temperature is held constant, the short-circuit current increases with the irradiance.Figure 2C,D illustrate that the open-circuit | 1043 voltage decreases with increasing temperature and increases with increasing irradiance.It can be concluded that there is a close relationship between the electrical parameters of PV cells and meteorological conditions (ambient temperature, irradiance).
Both temperature and solar irradiance are meteorological characteristics that are interrelated, rather than being independent variables.It is understood that higher solar irradiance results in increased heating of the air, leading to a larger disparity between daily maximum and minimum air temperatures.Therefore, there exists a specific relationship between solar irradiance and air temperature.There are three main types of frequently used formulas for these models 39 : where △T is the diurnal temperature variation range.△T m is the monthly average diurnal temperature variation range; Rs is the GHI; R a is the exoatmospheric solar irradiance.a, b, and c are the model parameters.
Based on the analysis conducted above, a distinct mapping relationship is observed among the output characteristics of PV systems, particularly regarding the current, voltage, solar irradiance, and temperature.Therefore, Formula ( 6)-( 8) also provides the correlation between irradiance and temperature.Therefore, in PV plants where meteorological measurement devices are unavailable, the ambient temperature can be estimated using electrical characteristics such as PV array current, voltage, power, and module temperature.

| Analysis with actual operation data
To further investigate the relationship between the electrical parameters of the PV plant and solar irradiance and temperature, actual operational data from a rooftop PV plant located at North China Electric Power University was utilized for analysis.The PV plant has a total installed capacity of 1.63 MW and consists of approximately 6650 PV modules and 512 arrays.The geographical coordinates of the plant are 38°52′32.75″Nlatitude and 115°29′56.14″Elongitude.The collected data includes output power, current, and voltage readings from the PV arrays, as well as measured data of solar irradiance, temperature, and wind speed.The PV station is equipped with a temperature sensor per array for gathering the operational temperature data.The PV combiner box incorporates current, voltage, and power acquisition to facilitate real-time monitoring of these parameters.Likewise, the weather station comprises an irradiance meter and temperature sensor for continuous monitoring of irradiance and ambient temperature.Data storage occurs at 1-min intervals.
The correlation coefficient  is a statistical indicator to describe the strength and direction of the linear relationship between two variables. 40 In Figure 4, a scatter plot matrix is displayed, illustrating the relationships between the variables of PV string current, string voltage, string power, ambient temperature, and solar irradiance.For instance, considering the variables of temperature and current, 1-A shows the scatter plot of temperature against the fitted curve of current, while 1-E shows the histogram and fitted curve of temperature.Similarly, 5-A shows the histogram and fitted curve of current, and 5-E presents the correlation coefficient between temperature and current.Similar patterns can be observed for the relationships between other variables.
Based on the analysis of Figure 4, the following conclusions can be drawn: 1.There is a strong positive correlation between solar irradiance and PV string output power.2. There is a negative correlation between string voltage and ambient temperature, as well as between string voltage and irradiance.
These findings emphasize the interdependencies between the different variables and provide insights into their relationships within the PV system.

| ALGORITHM PRINCIPLE AND THE PROPOSED METHOD
In this study, we employed a Genetic Algorithm-Support Vector Machine (GA-SVM) methodology within the solar irradiance estimation model due to its proficiency in managing small-sample data sets.To mitigate the difficulties presented by the extensive size of the initial data set, the minimum redundancy maximum relevance (mRMR) algorithm was utilized.Ultimately, the integration of GA-SVM, mRMR, and backpropagation neural network (BPANN) facilitated an efficient estimation of solar irradiance, effectively addressing the challenges of small sample sizes and large data volumes.

| GA-SVM
SVM is a robust machine learning algorithm, particularly effective in analyzing small-sample, high-dimensional datasets.It demonstrates a potent learning capacity, capable of deriving optimal solutions even from limited training data.Simultaneously, the GA-SVM method exhibits notable advantages in small-sample learning.
Hence, we have selected the GA-SVM method as the online temperature calculation model.
Given a linearly divisible sample T x y i = {( , ), = , the objective of the SVM is to identify a decision function f x g x ( ) = sgn( ( )), which accurately separates the training data set, while also possessing strong generalization capabilities in the classification process.The classification hyperplane wx b + = 0 is constructed (where w is the adjustable weight vector, and b is the offset) as The geometric distance from the hyperplane to the training set is determined as: The SVM model for solving the maximum partition hyperplane problem can be formulated as the following constrained optimization problem: When handling nonlinearly separable data, SVM often employ kernel functions.These functions transform data from a low-dimensional space to a higher-dimensional one, enabling linear separation.The performance of an SVM is greatly influenced by the choice and settings of these kernel functions, especially the penalty factor C and the gamma parameter (γ). 41o optimize these parameters, intelligent algorithms can be employed.One such algorithm is the GA, which is based on the principles of biological evolution, such as natural selection and survival of the fittest.This makes GA a potent method for efficiently finding optimal solutions.The process of using GA for SVM parameter optimization typically involves the following steps: Input and normalize data: The first step involves taking the input sample data and normalizing it to ensure that all features contribute equally to the learning process.
Initialize population and parameters: The GA starts with a population of individuals (possible solutions), and the parameters (like C and γ for the SVM) are encoded and initialized.
Calculate fitness: Each individual in the population is evaluated based on its fitness, which measures how well it solves the optimization problem (in this case, how well the SVM parameters perform).
Selection, crossover, and mutation: Using methods like the roulette wheel selection, individuals with higher fitness are chosen for genetic operations like crossover and mutation, leading to the creation of a new generation of solutions.
Iterative optimization: The process is iteratively repeated, with each generation hopefully moving closer to the optimal set of parameters for the SVM.
Through this approach, the GA helps in fine-tuning the SVM model parameters to achieve better performance on the given data set.

| mRMR method for feature selecting
When performing feature selection, it is common to use metrics such as correlation and mutual information to evaluate the relationship between features and labels.However, solely selecting the k features with the highest correlation values may not yield the optimal subset, potentially disregarding important information.In scenarios where there is substantial correlation among features, there might be redundant information for determining the class, resulting in increased redundancy within the selected subset.To tackle this issue, the mRMR algorithm is utilized. 42Its purpose is to strike a balance between minimizing redundancy and maximizing the correlation between features and labels.
The maximum correlation is defined as Minimum redundancy is defined as where S denotes the feature set.c denotes the target category.I x c ( , ) i denotes the mutual information between feature i and the target category c.I x x ( , ) i j is the mutual information between feature i and feature j.
The feature selection criteria for mRMR are: For a feature set F m with m features, the n features are selected as a feature subset We suppose that n − 1 features are selected according to the mRMR principle and they form the subset S n−1 , and the remaining subset is F S − m n −1 ; the nth feature found at F S − m n −1 still needs to satisfy the mRMR principle, and the nth feature needs to satisfy the condition that

| Proposed method
To accurately estimate the ambient temperature and solar irradiance within the PV plant, a composite calculation approach was utilized, consisting of five sequential steps.The process flow is depicted in Figure 5.

| Data processing
The data set used in this study encompasses the operational data collected from the practical PV system described in Section 1.It includes variables such as solar irradiance, wind speed, PV module temperature, and PV array output power.During the data analysis, we considered the variability of these variables while also accounting for measurement uncertainty.
Table 1 provides the key parameters of the PV plant data, including minimum, nominal, and maximum values.To address any errors in the data collection process, necessary measures were taken to handle missing data and replace outlier values.

| mRMR feature selecting
Due to the high dimensionality and computational complexity of the collected data, the mRMR algorithm is employed in this paper to identify relevant features from the original data set.This algorithm is utilized to determine the features with the highest relevance (Max-Relevance) and the lowest redundancy (Min-Redundancy).
F I G U R E 5 Composite calculation flow chart.

| Establish GA-SVM model
The GA-SVM model uses electrical features as input variables, including string voltage, current, power, and wind speed.The parameters for the GA-SVM model are described in detail in Table 2.

| Establish BPANN model
For the BPANN model, we selected voltage, current, and measured ambient temperature as input variables, and the measured irradiance as the output variable.Following a comparison of different parameter settings, the optimal parameters for both the GA-SVM and BPANN models are specified in Table 2.The neural network structure for the BPANN model is depicted in Figure 6.

| Solar irradiance and temperature calculation
The GA-SVM model was used to estimate the ambient temperature, while the BPANN model was employed to calculate solar radiation.To validate the accuracy of the model's performance, a comparison was made with experimental data using three different evaluation metrics: mean absolute error (MAE), correlation coefficient (R 2 ), and root mean square error (RMSE). 43

| Inputs selecting
To enhance the computational efficiency of the model and minimize data redundancy, feature selection utilizing mRMR was conducted during the  Where P is the outpower of the power plant, v is the electrical feature set, S is the filtered feature set, and F is the feature set to be filtered.
The impact of different input data dimensions on the outcomes is presented in Tables 3 and 4. It is observed that when employing an input dimension of k = 10 in Step 2, the calculated temperature aligns more closely with the expected/true value.Likewise, when utilizing an input dimension of k = 10 in Step 3, the calculated irradiance demonstrates greater proximity to the expected/true value.

| Different input conditions
To showcase the accuracy of the proposed model, a comparison was carried out between the two-step calculation model and the direct calculation model.The direct calculation model estimates irradiance without performing ambient temperature calculations.Figure 7 provides a comparison between the experimental T A B L E 3 Effects of mRMR on the outcomes of temperature calculations.) values obtained from both the two-step model and the direct calculation model.

RMSE
Table 5 presents the accuracy results of both the two-step computational model and the direct computational model.An analysis of the error distribution for these two methods is depicted in Figure 8.The error distribution of the two-step method is concentrated within the interval [ −34,  34], while the direct method exhibits concentration within the interval [−40 , 40].These findings indicate that the two-step method has a relatively smaller calculation error compared to the direct method.Overall, the results demonstrate that the proposed two-step model aligns well with the experimental data.showcasing the relationship between the actual and calculated temperature values.Figure 11 presents a time series plot comparing the actual and calculated irradiance values.Additionally, Table 6 offers valuable insights into the influence of data from various seasons on the estimation results.

| Calculation error analysis under typical seasonal conditions
The proposed two-step method shows better performance in estimating temperature and irradiance,  with the minimum MAE below 1 for temperature estimation and 20 for irradiance.The accuracy is especially high during the summer and autumn seasons, as evidenced by a coefficient of determination of 0.97.However, during winter, further refinement of the model is needed, as indicated by relatively large RMSEs for temperature and irradiance estimates.It is worth noting that the estimation error is more pronounced during periods of high actual irradiance, especially noticeable during the summer season.

| Adaptability analysis for mountain DPV plant
This study aims to evaluate the effectiveness of the proposed method by analyzing its adaptability using actual operational data from a mountain PV plant.The PV plant, situated in a mountainous region, is comprised of a distributed group with a cumulative installed capacity of 64 MWp.Within this group, there are five DPV plants, each equipped with a dedicated weather station that measures solar irradiance and temperature.The primary real-time measurement data for the study include both string voltage and string current.By using these input parameters and considering ambient temperature and irradiance as output variables, meteorological estimation is conducted for the PV plants.
Figure 12 displays the temperature and irradiance curves for the five DPV plants, comparing the estimated and actual data.Each subplot within the same row shows the comparison of temperature and irradiance values between the predicted and actual values for the corresponding power plant.Notably, the estimation error for temperature is more prominent during nighttime due to the PV plant's zero voltage and current, which makes it challenging to accurately predict the temperature drop.The evaluation results are depicted in Figure 13 and Table 7, with the horizontal axis representing the plant label and the vertical axis representing the evaluation index.Based on these findings, it can be concluded that the calculation accuracy of the five DPV plants is generally high.Figure 14   A common challenge in DPV plants is the absence of a meteorological data monitoring system, which significantly affects power calculation, fault diagnosis, and scheduling plan adjustments.This study addresses this issue by proposing a method to estimate environmental factors within the PV plant, specifically ambient temperature and solar irradiance.The method utilizes a twostep solar irradiance soft sensing model that combines the GA-SVM and BPANN algorithms.The model was verified in different power plants during various seasons, and the results show a coefficient of determination of 0.95, indicating a strong fit for the model.
The novelty of this study includes: 1.The investigation of the intrinsic relationship between ambient temperature and irradiance.2. The proposed model enables the acquisition of meteorological data while reducing the costs associated with installing, operating, and instrumenting measuring equipment.3. The findings of this research contribute to the advancement and enhancement of DPV output power forecasting and smart operational applications with soft sensing method.
The methodology presented in this paper establishes a foundation for the online calculation of environmental parameters specific to DPV plants.In the future, the methodology will incorporate additional data sources,

F
I G U R E 2 Effect of temperature, irradiance on electrical characteristics, and efficiency of PV modules.(A) Electrical charecteristics of solar cells at different temperatures, (B) electrical charecteristics of solar cells under different irradiances.(C) irradiance and short circuit, (D) irradiance and open circuit voltage, (e) temperature and short circuit current, and (f) temperature and open circuit voltage.

Figure 3
depicts the structural diagram of the PV plant used in this study.The parameters of the PV modules utilized in the power station include a peak power rating of 265 Wp, an efficiency of 16.19%, an open circuit voltage of 38.6 V, a short circuit current of 9.03 A, an open circuit voltage temperature coefficient β U OC of −0.3%/°C, and a short circuit current temperature coefficient a I sc of 0.06%/°C.
temperature and irradiance calculation stages.The following are the steps involved in the mRMR-based feature selection process.1. Calculate of the mutual information I P v ( , ) between the output power and the electrical characteristics.2. Mark the data set S v F F v = { }, = − { } when I P v ( , ) reaches its maximum value.3. Calculate the feature set F , where the features v satisfy the equation  = − { }, according to the incremental search algorithm.4. Verify if the number of features in the subset reaches n.If yes, output the subset S. If not, proceed to repeat step (III) and continue the search until the number of features reaches n.

Figure 9
Figure 9 illustrates a time series plot the actual and calculated temperature and irradiance values for different seasons.Figure 10 displays a scatter plot Figure 9 illustrates a time series plot the actual and calculated temperature and irradiance values for different seasons.Figure 10 displays a scatter plot

F I G U R E 8
Comparison of error distribution of two-step method and direct method.F I G U R E 9 Calculated curve of temperature and irradiance under typical seasonal conditions.

F
I G U R E 10 Scatter plot of calculated temperatures under typical seasonal conditions.(A) Spring, (B) summer, (C) autumn, and (D) winter.F I G U R E 11 Scatter plot of calculated irradiance under typical seasonal conditions.(A) Spring, (B) summer, (C) autumn, and (D) winter.
Figure12displays the temperature and irradiance curves for the five DPV plants, comparing the estimated and actual data.Each subplot within the same row shows the comparison of temperature and irradiance values between the predicted and actual values for the corresponding power plant.Notably, the estimation error for temperature is more prominent during nighttime due to the PV plant's zero voltage and current, which makes it challenging to accurately predict the temperature drop.The evaluation results are depicted in Figure13and Table7, with the horizontal axis representing the plant label and the vertical axis representing the evaluation index.Based on these findings, it can be concluded that the calculation accuracy of the five DPV plants is generally high.Figure14visualizes the distribution of the irradiance calculation error, demonstrating a skewness value of −0.59.As the actual irradiance value increases, the distribution of the irradiance calculation error exhibits a rightskewed pattern.

F
I G U R E 13 Error distribution.
Assuming PV output at different time is P solar1 and P solar2 , respectively, and the expected values over the statistical period are Model parameters.
T A B L E 1 Calculation errors of different seasonal models.
F I G U R E 12 Calculated curves of temperature and irradiance for mountain PV plants.
Model calculation error.numericalweather forecasts, to enhance the precision of the modeling process.NOMENCLATURE I L , I ph , I D , I sh , I 0 , I pv , R a , Rs, G ref irradiance (W/m 2 ) T C , △T , △T m , T pv , T A B L E 7 F I G U R E 14 Distribution of irradiance calculation errors.including