Blind photovoltaic modeling intercomparison: A multidimensional data analysis and lessons learned

The Photovoltaic (PV) Performance Modeling Collaborative (PVPMC) organized a blind PV performance modeling intercomparison to allow PV modelers to blindly test their models and modeling ability against real system data. Measured weather and irradiance data were provided along with detailed descriptions of PV systems from two locations (Albuquerque, New Mexico, USA, and Roskilde, Denmark). Participants were asked to simulate the plane‐of‐array irradiance, module temperature, and DC power output from six systems and submit their results to Sandia for processing. The results showed overall median mean bias (i.e., the average error per participant) of 0.6% in annual irradiation and −3.3% in annual energy yield. While most PV performance modeling results seem to exhibit higher precision and accuracy as compared to an earlier blind PV modeling study in 2010, human errors, modeling skills, and derates were found to still cause significant errors in the estimates.

because their accuracy depends on the analysis and modeling pipeline, which commonly include irradiance transposition, module temperature, and power output modeling.Different models and their combinations may result in varying accuracies, and different assumptions for performance loss factors (derates) will significantly affect the energy yield estimations.These may also depend on the PV plant configuration, module type and geographical location.Furthermore, the modeler's skills and experience can affect the resulting accuracy.
New PV performance models are continuously being developed whereas existing models are frequently updated.However, only a limited number of them have been evaluated independently from multiple aspects against high-quality field datasets (e.g., in previous studies [2][3][4][5][6][7] ).When an approach is tested against known datasets, the modeler might introduce a bias, which directly influences the approach's validity, reproducibility, and applicability for different systems.In such cases, blind intercomparisons are useful for benchmarking analysis pipelines and establishing the state of the art in PV performance modeling.The PV Performance Modeling Collaborative (PVPMC) was founded based on the outcomes of the blind PV modeling study in 2010. 8,9evious intercomparisons of PV modeling approaches include that of Friesen et al., 10 wherein time-series plane-of-array irradiance (Gpoa) and module temperature (Tmod) data were circulated to eight European institutions.These participants were asked to simulate the module-level performance of five PV technologies in seven climates, and it was found that the group's energy yield predictions agreed with ±5%.Moser et al. analyzed the long-term yield predictions of six expert modelers for a PV system in an Italian and an Australian site. 11ese modelers were required to independently obtain meteorological data for their simulations, which, for the Italian site, led to 6% differences in global horizontal irradiance (GHI), 20% differences in Gpoa, and ultimately nearly 30% differences in AC energy.And most recently, Vogt et al. 12 conducted an intercomparison of energy rating calculations per IEC 61853-3 13 with nine European institutes.Energy rating differences of 14% were found in the first blind comparison round.It took five rounds of calculations-and discussions among the participants-for the nine participants' calculations to agree within 0.1%.Ultimately, Vogt et al. 12 demonstrated how user-induced variability can be reduced when modelers have clear procedures for implementing key steps of the PV model chain.
To provide an opportunity for PV modelers to test their models and modeling ability against high-quality, real system data and to help provide a baseline quantifying the variability of differ-

| METHODOLOGY
The blind PV modeling comparison was announced in July 2021 through the PVPMC email list (https://public.govdelivery.com/accounts/USDOESNLEC/subscriber/new?topic_id=USDOESNLEC_ 185).The data and document describing the exercise were downloaded >600 times.Sandia received 29 submissions from 28 participants with various modeling pipelines, including new commercial software.This effort represents 26 institutions from 12 different countries.Air temperature was measured using two Climatronics Aspirated Shield Temperature Sensors.Module temperature was measured using back-of-module resistance temperature detectors (RTDs), on one module of each string.Voltage and current were measured at the string level for all systems using voltage dividers and Manganin shunts.The Roskilde systems and measurements setup in scenarios 3-6 are described by Riedel-Lyngskaer et al. 14 The participants had access to general instructions, hourly averaged weather data from the locations (Gpoa was not included), module and inverter spec sheets, system designs, and test reports.

| Scenarios and data
The test reports were only available for the systems in Albuquerque (i.e., S1 and S2) and included IEC 61853-1 15 matrix data, IEC 61853-2 16 incidence angle modifier (IAM), nominal module operating temperature (NMOT), and PAN (Panneau Solaire) files.A frequently asked questions (FAQs) section was regularly updated on the PVPMC website to ensure everyone had access to the same information.Modeling results were collected and handled by Sandia, ensuring anonymity.The participants know their "participant number" only, and they had the option to exclude their name from any publication.This paper's author numbers and order are unrelated to the participant numbers in the figures.
The participants were asked to simulate Gpoa (in W/m 2 ), module temperature (in C), and system DC power output (Pmp in W).Some participants resubmitted their estimates to correct "minor" mistakes such as modeling 48 modules instead of 12, submitting incorrect units (e.g., kWh/kWp or kW instead of W), reporting direct irradiance instead of global, and so forth.
To ensure non-physical irradiance values (i.e., sun below the horizon, below/above physical minimum/maximum, static measurements, and inconsistent irradiance components) are ignored, the datasets were filtered based on version 2 of the Baseline Surface Radiation Network (BSRN) Global Network-recommended quality control tests. 17Furthermore, datapoints during days with snow were filtered out from both locations.The data from Roskilde include additional filters to ensure the proper operation of single-axis tracking (by comparing the tilt angles of neighboring trackers) and that the data acquisition system is online.All values lower than 100 W/m 2 in frontside Gpoa, 50 W in output DC power, and beyond À5 C and 45 C in ambient temperature were also filtered out.
The validation datasets are available online in open access at two locations.The first is on the website of the PV Performance Modeling Collaborative at https://pvpmc.sandia.gov/.The second is on the Duramat Data Hub at https://datahub.duramat.org/dataset/pvperformance-modeling-data(doi: https://doi.org/10.21948/1970772). 18

| Statistics
Based on the participants' affiliations, they were grouped into the following categories: (1) Commercial, (2) Research, (3) Software, and (4) Student.The commercial category includes consulting and engineering companies, independent engineers, owners, utilities, and producers.Figure 1 shows the percentage breakdown per category.Some models/software can reveal who the participant is when only used once and other models did not achieve an adequate statistical sample.To ensure anonymity and focus on approaches with significant participation, the following categories were created: 1. "Other model" refers to known models used by a single participant (e.g., pvlib-pvwatts from pvlib-python 19 ); 2. "Custom model" refers to "in-house" customized models that are not available to the public (these are models developed and used by some independent engineers); 3. "Other software" includes software used by a single participant (e.g., Archelios, 20 PVSol, 21    model 23 whereas in the case of temperature modeling, more than 60% of the participants used the PVsyst (Tcell), 24 Sandia Array Performance Model (SAPM), 25 and the Nominal Operating Cell Temperature (NOCT) model. 26It should be noted here that "PVsyst (Tmod)" refers to the temperature model in PlantPredict 27 ; this model is the same as in PVsyst, but it then gets converted to module temperature using the equation developed by Sandia. 25Finally, close to 50% of the participants used the PVsyst 28 and System Advisor Model (SAM) 26 software packages for PV performance modeling.
As mentioned in subsection 2.1, the S1 and S2 systems included PAN files, IEC 61853-1 matrix, IAM, and NMOT data.As seen in Figure 3, most participants used the PAN files, mainly due to the high percentage of PVsyst users.Only 24.6% of the participants used the provided IEC 61853-1 data.The IAM and NMOT data were used by half of the participants.The percentages in Figure 3 are also categorized as a function of individual models and categories.

| RESULTS
To rank the participants, the mean absolute percentage error (MAPE) was used to compare the annual irradiation (Figure 4A) and energy yield (Figure 4B) estimates.The median MAPE in annual irradiation and energy yield estimations were close to 2% and 5.5%, respectively.
Interestingly, the participants with the lowest MAPE (<<1%) in the annual irradiation estimation (i.e., P23, P2, and P22) exhibited high MAPE in annual energy yield estimation (ranging from 8.2% to 68.7%; the y-axis limits were truncated to 7% for clarity).Note that not all participants modeled all six scenarios.

| Irradiance modeling
A PV performance modeling pipeline always begins with irradiance.In this study, the participants had the measured global horizontal, direct normal, and diffuse horizontal irradiances and were asked to apply transposition models to estimate Gpoa.
Figure 5 shows the diurnal front (top row) and rear (bottom row) Gpoa estimates by all participants.One of the participants in S3 appears to have simulated a fixed tilt system instead of tracking.As expected, front Gpoa is not as difficult to predict whereas problems arise when modeling the rear Gpoa, where minimum and maximum differences above 100% were observed.It is worth mentioning that despite these high differences in rear Gpoa, this component represents <10% of the total irradiance.participant made the same mistake.It is worth noticing that all but one of the PVsyst (Tcell) users exhibit nearly identical distributions demonstrating consistent calculations within the most popular software.However, it should be noted that while the median bias of this model is small, the comparison is against module temperatures, while PVsyst (Tcell) only calculates cell temperature.Minimum and maximum percentage differences from the measured front Gpoa at noon ranged from À11% (S5) to +61.3% (S3); the latter was due to a participant who simulated fixed tilt rather than tracked Gpoa.If this participant is excluded, the minimum and maximum differences would range from À11% to 1.99%.Minimum and maximum percentage differences from the measured rear Gpoa at noon ranged from À99.7% (S4) to +149.4% (S6).

| OBSERVATIONS AND LESSONS LEARNED
For the systems in Albuquerque (i.e., S1 and S2), the participants had module information that is not commonly available.PAN files might be available in databases but, in this study, the PAN files were specific to the modules in S1 and S2, that is, not generic representative PAN files.The objective here is to observe how the various assumptions or usage of additional information affected the results and to describe the lessons learned from this study.
It should be expected that as module temperature increases, efficiency will decrease.To examine whether this holds true for all participants' temperature coefficient inputs, these trends were reverse calculated.This was done by taking a subset of data for modeled Gpoa from 800 to 1200 W/m 2 and wind speed lower than 10 m/s and fitting a regression model for modeled power against the module temperature by each participant (see Figure 9).Qualitatively, it can be observed that some participants assumed lower temperature dependency, while others assumed positive temperature coefficients.The latter might be due to an error in applying the sign in the equation; another speculation could be that the participant(s) may have used the temperature coefficient for current, instead of power.Furthermore, some participants miscalculated the system size by either overor under-sizing the number of PV modules.Therefore, human errors are not uncommon in PV performance modeling.Another common confusion observed during this blind PV modeling comparison was that many participants interchangeably used the U0 and U1 values of the Faiman model with the Uc and Uv values of the PVsyst (Tcell) model.Although these models are similar, they are not the same, and therefore, the U parameters should not be used interchangeably.
Methods for parameter translation between temperature models (e.g., translate U0, U1 to Uc, Uv) have been recently published elsewhere, 30 and functions are available in pvlib-python under the pvlib.temperature.GenericLinearModel.
The modeled temperature rise (i.e., the difference between modeled module temperature and measured ambient temperature) F I G U R E 1 0 Modeled temperature rise (i.e., the difference between modeled module temperature and measured ambient temperature) as a function of modeled Gpoa.Robust regression was used to de-weight outliers (dashed lines).The black dashed lines correspond to module measurements (only available for S1-S3) for wind speed <2 m/s and module temperature >0 C. respective models were frequently set too high.This resulted in maximum temperature difference between participants and field data of up to $15 C at 1000 W/m 2 in Albuquerque and $10 C in Roskilde, which would produce an error in simulated module power reaching $6% at those times.Recent work 32 shows that all these named temperature models can be improved to account for radiative losses, and a modified Faiman model was made available in the open-source pvlib-python library as pvlib.temperature.faiman_rad(after this study took place).Nevertheless, the improved models still require the appropriate empirical heat loss coefficients for the system being simulated.
The relative efficiency across the modeled Gpoa plots (see The plots in Figure 12 show the bias in annual irradiation (A) and energy yield (B) for all participants.Although the irradiation bias for most participants was positive (i.e., showing overestimation) and the overall median was very close to zero (see red dashed lines), the energy yield was underestimated by most of the participants with a median value of À3.3%.This behavior raised the question about the derate assumptions made by the participants.Quantifying or setting the derates is a critical step in PV performance modeling.Derates (or performance loss factors) describe the losses that can occur within a system, for example, due to conductor resistance, soiling, module degradation, and so forth.After comparing derate assumptions by individual participants, it was found that the highest underestimations were exhibited by participants that over-budgeted for derates.In contrast, the participants applying modest derate assumptions achieved biases much closer to zero.This is interesting because the modeling community is often concerned with the accuracy of model equations and their parameter values, 33,34 whereas in this study, the errors were driven largely by the initial assumptions made by the modelers.It should be mentioned, however, that these scenarios include data from lab-scale systems that were built for research purposes.As such, these are likely to experience lower losses than utility scale power plants.
To further examine the impact of the derate assumptions, the annual bias by each participant, for each scenario (i.e., Figure 12B), was applied as an adjustment to their corresponding hourly time-series (i.e., by multiplying the hourly modeled power time-series by one could be expected since software companies know their products better than anyone else.It is also worth noting that the commercial sector, which deals mostly with larger power plants, assumed higher derate values resulting in a higher bias spread (see orange boxplot).This is another indication that modeling different system sizes will require appropriate derate budgeting.The adjusted annual energy yield bias shows that all sectors but the student category exhibited distributions very close to zero.

| CONCLUSIONS
PVPMC's blind PV modeling intercomparison found that: 1.The irradiance transposition models seem to perform well, except the isotropic one.
2. Modeling the rear Gpoa is still challenging with errors exceeding $±100%.However, it should be mentioned that rear Gpoa represents $10% or less of the total irradiance.Unfortunately, the bifacial PV time-series in this study contained only a handful of rear Gpoa days.As such, no further analysis has been conducted to investigate the impact of their variations.
Depending on data availability, future PVPMC blind modeling intercomparisons will include larger systems, subhourly time-series, investigations on rear Gpoa, and an iterative submission process that would enable a more detailed determination of the uncertainties involved at each step of a PV performance modeling pipeline.
ent models and modelers, PVPMC organized a new blind PV performance modeling comparison in 2021.Measured weather and irradiance data and detailed descriptions of six PV systems from two locations (Albuquerque, New Mexico, USA, and Roskilde, Denmark) were provided.Participants were asked to simulate the systems' plane-of-array irradiance, module temperature, and DC power output and submit their results back to Sandia for processing.This work compares system-level performance modeling considering all DC-side loss factors.Rather than independently obtaining meteorological data for the simulations, participants were provided with measured meteorological and irradiance data as a starting point.This provision enabled the propagation of sources of uncertainty within the modeling pipeline instead of the results being affected by the uncertainty of the input data.Furthermore, this study was open to anyone (i.e., industry, research, and academia) to participate, rather than inviting specific individuals.As such, this article presents the multidimensional data analysis of the PVPMC blind modeling intercomparison, providing the results for each modeling step.Finally, it summarizes the lessons learned and areas where improvements are needed.

For
this comparison, six scenarios of practical interest to the community were identified and included (a) fixed and tracking systems, (b) monofacial and bifacial modules, (c) modules representative of the current PV market and upcoming technologies, and (d) distinctively different geographical locations/climates (see Table 1 ).In Albuquerque (S1, S2), GHI was measured using a Kipp and Zonen CMP-21 pyranometer.Kipp and Zonen CH1 and Eppley normal incidence pyrheliometers (NIP) were used to measure DNI.To measure the diffuse horizontal irradiance (DHI), two Eppley precision spectral pyranometers (PSP) were used, one having a shade disk and the other having a shade band.The Gpoa was measured using a Kipp and Zonen CMP-11 pyranometer.Wind speed was measured at 10 m above ground level using a Climatronics Wind Mark III Wind Sensor.

Figure 2
Figure2shows statistics on the models used in this study.With respect to the transposition models, the majority used the Perez

3 . 2 |
Another observation is an apparent time-shift in the estimates of some participants.It seems that there is confusion between instantaneous and time-averaged measurements especially when involving sun position.This study reported the hourly averaged irradiance data at the end of the hour.Therefore, most models should assume a sun position calculated 30 min before the hourly timestamp as being the most representative.On the other hand, other data sources commonly place timestamps at the beginning of the interval.As such, some software properly account for this by calculating the solar position and other time sensitive values at the center of the interval (i.e., +30 min).Modeling software, such as PVsyst, make an exception for timestamps that span sunrise or sunset to pick a sun position halfway between the horizon and the sun position at the neighboring daylight timestamp.In this study, some participants seemed to adjust by shifting their time-series by 30 min back, while others kept it at the end of the hour.This is clearly an area where procedures could be standardized.Empirical cumulative distribution functions (ECDF) presentresiduals in an ascending order to observe how they are distributed across the datasets.The ECDF plots in Figure6show the residuals between modeled and measured Gpoa grouped by the transposition models.The off-pink lines are the individual participants, and the black dashed lines indicate the median residuals per model.A steep rise near zero suggests that there are mostly small model errors and relatively few large ones.Most models except the isotropic indicated good accuracy (median values close to zero).The isotropic model underestimated Gpoa by 11.25 W/m 2 .Although the distributions of residuals for most Perez users cluster together, some outlying distributions still exist indicating errors in the implementation of solar position algorithms, system configuration, and the possibility of applying different Perez model coefficient sets, other than the coefficients of the most commonly used set "All sites composite 1990." 29When comparing residuals against system configuration, there was a slight over-estimation in the single-axis tracking system in S3 (median residual of 6.5 W/m 2 ) as compared to the fixed tilt systems in S1 and S5 with À1.7 W/m 2 and 0.77 W/m 2 , respectively.Module temperature modeling First, it should be mentioned that although the accuracy of resistance temperature detector (RTD) sensors is typically within 0.1 C, F I G U R E 1 Categorization of participants.It includes commercial entities, researchers, software companies, and students.Commercial entities have the following subcategories: consulting and engineering, independent engineer, owner, utility, and producer.it is still not possible to know what the representative temperaturefor a PV array is, unless an array is equipped with multiple sensors (e.g., one for each solar cell).This is practically and economically not feasible.Therefore, although this study compares with an average module temperature value from four different sensors, the differences reported in this work should not be taken in a strictly quantitative manner.

Figure 7
Figure7shows the ECDF plots of the module temperature residuals for three out of six scenarios (due to availability), grouped by all models.It should be noted that some models, such as the

Figure 8
Figure8shows the ECDF plots of the normalized power residuals for all scenarios, grouped by all model categories.The power residuals were normalized by the system's nominal capacities to allow a meaningful comparison among the different scenarios.Overall, all models

F I G U R E 6
Empirical cumulative distributions of the plane-of-array irradiance residuals (in W/m 2 ) for all scenarios, grouped by the different transposition models.Participants are displayed in different colors, and the dashed black lines indicate the median residuals within the same modeling category."Other" and "Custom" categories include models that differ within the same category.F I G U R E 7 Empirical cumulative distributions of the module temperature residuals (in C) for all scenarios, grouped by the different temperature models.Participants are displayed in different colors, and the dashed black lines indicate the median residuals within the same modeling category."Other" and "Custom" categories include models that differ within the same category.
underestimated the normalized power by up to 43.3 W/kWp (or 4.33%), whereas the SDM category demonstrated superior performance with a bias close to zero (À1.07 W/kWp).PVsyst users, who comprise 33% of the participants, group well together except in one instance.This mass underestimation raises the question of whether this is because of a modeling issue, input selection, or any other assumptions involved within the pipeline.This is further examined in Section 4.
Empirical cumulative distributions of the normalized power residuals for all scenarios, grouped by the different models.The power residuals are normalized by the systems' nominal capacities (i.e.W/kWp).Participants are displayed in different color whereas the dashed black lines indicate the median residuals within the same modeling categories."Other" and "Custom" categories include models/software that differ within the same category.as a function of modeled Gpoa is shown in Figure 10.Robust regression 31 (colored dashed lines) was used to de-weight outliers.The black dashed lines correspond to module temperature measurements (only available for S1-S3) for wind speed <2 m/s and module temperature >0 C. Negative temperature differences were measured in Albuquerque where low sky temperatures led to significant radiative cooling of the modules.Only one custom model separately accounts for radiative losses and correctly predicts such negative values.All other models lump radiative losses together with convective losses and represent the combined heat loss with one or two empirical heat loss coefficients.In Driesse et al., 30 it was shown that all of the named models in Figures 7 and 10 are essentially equivalent; therefore, underprediction of module temperature in the simulations is a clear indication that the heat loss coefficients for the F I G U R E 9 Regression model fits for modeled power versus module temperature.The scatter was removed to improve clarity.The regression model fits considered datapoints for modeled Gpoa from 800 to 1200 W/m 2 and wind speed <10 m/s.Participants are shown in different colors.

Figure 11 )
Figure 11) can provide information on the electrical modeling assumptions made by the participants based on whether the provided module data were used or not.The relative efficiency was calculated by taking the ratio of modeled efficiency and each participant's "nominal" efficiency.This was calculated by taking a subset of modeled Gpoa from 950 to 1050 W/m 2 and assuming the median temperature corrected (to 25 C) efficiency as the "nominal" efficiency for each participant.The plots in Figure 11 were categorized based on the participants' responses on PAN or IEC 61853-1 matrix data usage; the top and bottom rows correspond to S1 and S2, respectively, while the red circles are the measured values.The data in Figure 11 represent conditions where the AOI < 70 and the modeled Tmod is from 20 C to 30 C. IEC 61853-1 data for all modules show lower efficiencies at low irradiance.Many participants' results matched these data trends, whereas others calculated flat efficiencies with Gpoa or showed higher

F I G U R E 1 1
Figure 12C, where the overall median bias is nearly zero indicating that the majority of bias errors in Figure 12B were indeed linear.This reinforces the potential conclusion that input assumptions matter more than the model, at least for the climates and systems investigated in this study.How do these results compare to the original PVPMC blind modeling study in 20108,9 ?Figure13(left) shows the bias in annual energy yield from the 2010 blind modeling study, whereas the plot on the right exhibits the bias from this study; these are categorized by the models.Overall, there is a large shift from overestimating energy yield (as high as $20% in 2010), to being very conservative with the

3 . 4 . 5 . 6 .
Standardization is needed for handling sun position calculations when using time-averaged irradiance measurements.Incorporating a radiative loss term in module temperature modeling appears to improve accuracy.There is confusion around the U values for Faiman and PVsyst temperature models.Uc and Uv (PVsyst) values should not be used in place of U0 and U1 (Faiman) values.Most software and models showed similar results indicating good reproducibility among participants, especially when compared with the 2010 blind modeling study.For example, the spread in estimated energy yield among PVsyst participants is now $6% compared with $33% in 2010.7.Uncertainty and large variation in derate factors between participants appear to explain most of the differences; it was observed that modelers overestimated the derates resulting in significant power underestimation.8.Human errors are not uncommon.The intercomparison highlighted several errors related to the temperature coefficients and the efficiency across irradiance.There is an opportunity to develop screening tests that can detect such errors, thus assuring stakeholders of the accuracy of the modeling results.9. Modeler skill at understanding, choosing, and using the models and their parameter correctly, and accumulated experience observing various derate mechanisms in operational systems seems to be more important than the PV model itself (see 7 and 8 above).