Capacity value evaluation of wind farms considering the correlation between wind power output and load

Funding information National Key R&D Program of China, Grant/Award Number: 2017YFA0700300; Nanjing Tech University for talent introduction, Grant/Award Number: 39810127; State Grid Science and Technology Program of China, Grant/Award Number: SGJSJX00YJJS1800721 Abstract A method of capacity value evaluation for wind farms considering the correlation between wind power and load is presented. The paper starts with defining the metric of capacity value called capacity credit, and its basic evaluation process. Then the core part of capacity credit evaluation, which is the reliability assessment of power systems, is focused on. In this core part, two limitations of the frequently used cross entropy based importance sampling method are analysed. To solve the problems, an improved method is proposed by using truncated Gaussian mixture model as the proposal distribution of the cross entropy based importance sampling methods. This improved method is adopted to speed up the reliability assessment of composite power systems in the capacity credit evaluation. Finally, several numerical tests are designed and performed on the IEEE-RTS 79 and IEEE-RTS 96 test systems. The results show that the improved method is faster than traditional cross entropy based importance sampling methods when assessing the reliability of power system. Besides, the efficiency of the improved method is almost impervious to the correlation of load and wind power output, which ensures its applicability in different scenarios.


INTRODUCTION
With the concerns about the exhaustion of fossil fuels and environmental protection, many scholars and experts are paying attention to different kinds of renewable energy like wind and solar energy [1][2][3]. Obviously the load demand cannot be satisfied by the renewable energy source (RES) alone, so RES is often considered to have a negligible capacity value. If the capacity of RES is considered as zero in the planning of power systems, many conventional generation units are needed to meet the load demand. However, the capacity value of RES is actually not zero, so some of the conventional units are redundant, indicating an excessive investment on generation capacity. Therefore, it is necessary to evaluate the value of RES objectively, which is frequently accomplished by using the conception of capacity credit (CC) [4,5]. The essence of CC evaluation of RES is to quantify the disparity of the power system reliability before and after the RES integration [6].
To quantitatively define CC, various metrics are designed, such as equivalent firm capacity, equivalent conventional capac- ity and the equivalent load carrying capacity (ELCC) [7]. The first two metrics are defined from the generation side to indicate the conventional generation capacity that can be replaced by the evaluated RES while the reliability level of the whole system is kept unchanged. From this perspective, the two metrics are used mainly in the portfolio of generating capacity. ELCC is defined from the demand side, indicating the extra load that can be supported after the integration of the evaluated RES while the reliability level of the whole system is kept unchanged. From this perspective, it is believed that ELCC is more suitable in power system expansion planning which needs to consider load growth [8,9]. It is pointed in ref. [8] that, when all the of aforementioned metrics adopt probability of load curtailment (PLC) as the indicator of reliability level, the variations of these metrics with PLC are rather consistent. Besides, ELCC and equivalent firm capacity provide rather similar results. These CC metrics of wind generation have already been used in adequacy calculation in the capacity markets of different jurisdictions [10]. ELCC is also adopted in ref. [11] for optimising the coordinated procurement of multi-area strategic demand reserve. We also employ ELCC as the metric of CC in this paper.
According to the essence and various definitions of CC, it is concluded that any CC evaluation method needs to solve two problems: 1) How to assess the reliability level of composite power systems; 2) how to find the value of the CC metric with the reliability level of the whole system kept unchanged. A classic method for evaluating the CC of wind energy is introduced in ref. [12]. In this method, the Monte Carlo simulation (MCS) is used to assess the reliability of the power system, and the secant method is used to carry out an iterative process to search for the CC value. This method is improved in ref. [13] by incorporating the parallel computing technique to accelerate the reliability assessment process.
In fact, the second problem can be well solved by the secant method, which is seen in many relevant works [6,14]. Hence, the efficiency of all CC evaluation methods mainly relies on the fast and accurate reliability assessment of the power system. There are several categories of methods for power system reliability assessment, but the MCS methods can consider the transmission line outages and the power flow constraints of the power system in a much simpler and more intuitive way. Therefore, it is pointed in ref. [15] that the MCS methods currently appear to be more attractive than other categories of methods.
The more preferred MCS methods often incorporate variance reduction techniques, such as the importance sampling (IS) [16], the Latin Hypercube sampling [17] and the stratified sampling [18]. In the mentioned techniques, IS methods generally perform better and attract more attention. However, applying IS methods to power system reliability assessment needs to find a proposal distribution for all the random variables in the system, which is somewhat difficult. To find the proposal distribution for IS, a series of methods is proposed by using the cross entropy theory, which is generally more systematic and efficient than other methods. IS methods incorporating the cross entropy theory are called cross entropy based importance sampling (CE-IS) for short.
A version of CE-IS is provided in ref. [19] to speed up the reliability assessment by adjusting the fault probability of the conventional units and the transmission lines. However, the load profile remains unchanged, which limits the acceleration effect of this method. To solve the problem, an improved version of CE-IS is developed in ref. [20], where the probability distribution function (PDF) of load are also adjusted to achieve better performance in acceleration. Because this method fails to consider the integration of RES, it is further improved in ref. [21], where copula theory is used to model the correlation between wind farms.
According to the review of previous work, there are still problems remaining unsolved in CE-IS.
1) With the increasing integration of RES, there are much more random variables with different probability distributions in the power system. Therefore, how to find suitable proposal distributions conveniently for the variables becomes more complicated than before.
2) Although some of the CE-IS methods above have modelled the correlation between variables, few of them adapt to the possible different correlation structures, so their efficiency declines as the correlation between load and wind power increases. It is noteworthy that "coping with the correlation" is not equal to "modelling the correlation." Correlation modelling is on the data input side, which is merely used to generate correlated samples as the input of CE-IS or other algorithms [22,23]. However, the existing CE-IS methods are unable to cope with these highly correlated samples efficiently, so CE-IS needs adaptation.
To alleviate the efficiency decline, copula theory is introduced in ref. [21] to try to conserve the correlation in the IS PDF. However, the updating formulas of the IS PDF are still the same as those in ref. [20], which means the influence of the copula function on updating the IS PDF is neglected. Hence, the resulting IS PDF does not obey the cross entropy theory strictly and thus is not mathematically rigorous, so the efficiency of the CE-IS method in ref. [21] is limited to a certain degree. Similar effort has also been made in ref. [24]. Hence, there is not a satisfactory solution to the efficiency decline.
This paper focuses on addressing the two problems with the main contributions listed below.
1) The Gaussian mixture (GM) model is used in CE-IS to search the proposal distributions of random variables, which is able to fit different kinds of distributions and consider the correlation between variables at the same time.
2) The values of load demand and wind power output definitely have bounds, while the range of the variables in the GM model is (-∞, +∞). That means the GM model cannot be used to generate samples directly. Hence, a tailor-made truncation process is designed and imposed on the GM model and a sampling method is constructed accordingly.
3) The CC evaluation process is improved by using the truncated GM model based CE-IS (TGCE-IS) as the reliability assessment method and various factors influencing the CC of wind energy are investigated.
The rest parts of this paper are organised as: Section 2 summarises the definition and basic evaluation process of CC. Section 3 overviews the basic theory of CE-IS and analyses the two limitations of traditional CE-IS methods. Section 4 presents the general idea and process of TGCE-IS, and Section 5 details how to use TGCE-IS to assess the reliability of a specific composite power system. In Section 6, numerical tests are performed to verify the proposed method and some influencing factors of CC are investigated. Section 7 lists the main findings in the results of numerical tests. The conclusions are summarised in Section 8.

DEFINITION AND EVALUATION PROCESS OF CAPACITY CREDIT
ELCC is chosen as the metric of CC in this paper. The ELCC of a generating resource is defined based on its contribution to FIGURE 1 Flowchart of equivalent load carrying capacity evaluation using the secant method to adjust L adj the system reliability, which is described in ref. [6] and can be expressed as where γ(G,L) is a function of the reliability index of the power system with respect to G and L. G is the total installed capacity of all generating resources and L is the maximum possible load (or peak load). G new is the total installed capacity after the evaluated generating resource is added. L adj is the peak load after adjustment. According to Equation (1), CC quantified by ELCC is actually the extra load that the system can support after the evaluated generating resource is added, while the reliability level of the whole system remains unchanged. The typical value of CC is between 0 and 1, because no generating resource is fully reliable to provide the power output equalling its installed capacity all the time.
For a given power system, G and L are known and constant, so γ(G,L) is also constant. If a new generating resource is added to the power system with its installed capacity, G becomes G new , which is still known and seen as constant. Therefore, in the CC evaluation of this newly added resource, only L adj needs to be adjusted to satisfy the first equation in Equation (1). Hence, CC evaluation is a problem of solving the first equation of Equation (1), where γ(G new ,L adj ) is seen as a function only related to L adj .
Hence, a method for rapidly solving equations is necessary. As stated in the Introduction, the secant method is frequently used to take on this task. More details about the application of the secant method to ELCC evaluation can be acquired in ref. [6]. The procedures of the ELCC evaluation are shown in Figure 1.
In this paper, the reliability index computation in the ELCC evaluation considers the transmission capacity limits of power lines, which is actually the reliability assessment of the composite power system. It is more complicated than generating system adequacy assessment and is the focus of this paper.

METHODOLOGY OF TRADITIONAL CROSS ENTROPY BASED IMPORTANCE SAMPLING METHODS IN POWER SYSTEM RELIABILITY ASSESSMENT
According to Figure 1, the power system reliability assessment is repeated for multiple times, so its computing speed has an important effect on the ELCC evaluation. As stated in the Introduction, an attractive method for reliability assessment is CE-IS, which is a variant of MCS with promising performance in variance reduction.
In this section, the basic ideas of MCS and CE-IS are overviewed first. Then the limitation of traditional CE-IS is analysed.

Overview of Monte Carlo simulation methods
To assess the power system reliability is to compute γ(⋅) in Equation (1), which reflects the possibility and consequences of power shortage. γ(⋅) can refer to various reliability indices, among which PLC, expected duration of load curtailments (EDLC), expected demand not supplied (EDNS) and expected energy not supplied (EENS) are the most frequently used ones. PLC is chosen as γ(⋅) to present the whole algorithm of ELCC evaluation in this paper, but other indices are also able to take on this task as well.
The definition of PLC in ref. [25] is where γ PLC is short for γ PLC (⋅), which is a realisation of γ(⋅) in Equation (1) by using PLC as the reliability index. x is the state vector of the power system, which consists of many random variables and is detailed in the following section. Ω is the set of all possible system states. p(x) is the PDF of x. I(x) is an indicator function, which is defined as where optimal power flow (OPF) analysis is needed to tell whether x causes loss of load or not. The model of OPF will be detailed later. Equation (2) is actually the expectation of I(x), with x obeying the distribution defined by p(x). Therefore, MCS can be applied to Equation (2), because the essence of MCS is using sample average to estimate the real expectation. By using MCS, PLC is estimated as where n S is the sample size. γ PLC,MC is the estimate of γ PLC by using MCS. Note that all the samples of x are generated according to p(x). The more samples are used in Equation (4), the more accurate γ PLC,MC is. The accuracy of γ PLC,MC can be measured by coefficient of variation, which is defined as where β PLC,MC is the coefficient of variation of γ PLC,MC . std(⋅) is the standard deviation. According to [25], the differences between PLC, EDLC, EDNS and EENS locate in the different definitions of I(x) in Equation (2). However, regardless of the definition of I(x), Equation (2) is always in the form of an expectation. Hence, EDLC, EDNS and EENS can also be estimated by MCS and its variants, including the improved CE-IS method which will be proposed in the following section of the paper.

Basic idea of traditional cross entropy based importance sampling methods
When PLC is small (e.g., below 1 × 10 -3 ), MCS requires incredibly large sample size to make the accuracy of γ PLC,MC acceptable.
The IS methods belong to variants of MCS, which can reduce the sample size of MCS, so they can accelerate the estimation of PLC. The basic idea of IS is to convert Equation (2) as below [24].
where g(x) is a new PDF, named IS PDF throughout this paper. Q(x) is named likelihood ratio function. γ PLC,IS is the estimate of γ PLC by using IS. Similar with Equation (5), the accuracy of γ PLC,IS is defined as where β PLC,IS is the coefficient of variation of γ PLC,IS . The last row of Equation (6) is still a sample average like MCS, but the samples used in Equation (6) are generated by using g(x) as the PDF. Selecting a proper IS PDF g(x) leads to According to Equations (5) and (7), if β PLC,MC and β PLC,IS are fixed to the same level, the smaller std(I(x)Q(x)) is than std(I(x)), the smaller the sample size of IS. Hence, how to select g(x) is the core of IS.
A more systematic way of selecting g(x) is CE-IS. Due to limited pages, we summarise the basic procedures of CE-IS in refs. [14,20,21] by Figure 2.
According to Figure 2, CE-IS methods can be divided into two stages. The first stage aims at finding the IS PDF g(x). The second stage uses the IS PDF to generate samples to estimate the index.

Limitations of traditional cross entropy based importance sampling methods
There are mainly two limitations on the traditional CE-IS methods: 1) As shown in Figure 2, traditional CE-IS methods need to choose a parametric function form for IS PDF in advance. For example, the CE-IS methods in refs. [14,20,21] choose Gaussian or Weibull distribution functions as the IS PDF of load or wind power output. This choice relies more on experience and may be somewhat arbitrary, so the efficiency of CE-IS may be limited by an improper choice. 2) The correlation between continuous variables is either omitted or not addressed mathematically rigorously in the IS PDF, which causes the efficiency of CE-IS to decline in some cases. The first problem is easy to understand, so we focus on analysing the second problem with a 2-dimensional case.
Denote a continuous vector ξ = (y,z). Assume that y is almost perfectly positively correlated with z, so the samples of ξ are limited in a narrow band, like the red band in Figure 3.
Denote the area enclosed by the black ellipse in Figure 3 as the set A. Then construct an indicator function I(ξ) as Based on the definitions above, the probability of any point ξ locating in A, Pr(ξ∈A), is expressed shown as where Pr(⋅) stands for the probability. p(ξ) is the PDF of ξ.
The CE-IS method used in refs. [14,20] sets the IS PDF as a multivariate Gaussian PDF, where the correlation coefficient matrix (CCM) of all random variables is the unit matrix. That means the parametric function form chosen for the IS PDF of ξ by this method is: where μ and σ are the mean and standard deviation of the Gaussian PDF. By using this g(ξ), the samples generated by CE-IS scatter in the blue area in Figure 3. However, although many samples locate in the set A due to the optimisation of μ and σ, few of them are included in the red band.
The problem stated above is obviously caused by omitting the correlation of the random variables in the IS PDF, which generates too many "Q(ξ) = 0″ samples. This efficiency decline of traditional CE-IS methods will be presented intuitively in the section of numerical tests.

METHODOLOGY OF TRUNCATED GAUSSIAN MIXTURE MODEL BASED CROSS ENTROPY BASED IMPORTANCE SAMPLING
According to the analysis above, the problems of CE-IS exist in addressing the continuous variables. Therefore, TGCE-IS is developed in this section, which differs from the traditional CE-IS methods only in addressing the continuous variables.
TGCE-IS introduces the GM model as the IS PDF of continuous variables, which can solve the two problems of CE-IS because: 1) The GM model is a non-parametric model, which can fit any distribution of continuous variables accurately if it contains enough components. Therefore, adopting it as the IS PDF can solve the first limitation of traditional CE-IS. 2) Every component of the GM model is a multivariate Gaussian PDF, whose covariance matrix contains the information of correlation. If the covariance matrices participate in the updating of parameters of IS PDF, the correlation is addressed mathematically rigorously in CE-IS. That means the second limitation is also solved.
However, the domain of the GM model is without boundaries, while the variables in reliability assessment are limited in certain domains. Therefore, truncation is needed to make the GM model applicable in reliability assessment. Hence, the key points of TGCE-IS are: 1) Construct the IS PDF of continuous using the truncated GM model; 2) Sample the truncated GM model; 3) Find the IS parameters of the truncated GM model.

Construction of the importance sampling probability distribution function with the truncated Gaussian mixture model
In the reliability assessment, the discrete variables are all independent, so the GM model is only used to construct the IS PDF of continuous variables. The IS PDF of discrete variables will be detailed in the next section when TGCE-IS is formally applied to power system reliability assessment.
Denote discrete and continuous variables as x dv and x cv , which meet The IS PDF of x cv is constructed by the GM model as where n GS is the number of components of GM model. ω e , μ e and Σ e are the weight, expectation and covariance matrix of the e-th component. The sum of ω e is 1. φ(⋅) is the Gaussian PDF. n is the dimensionality of x cv . Firstly, the two-stage method in ref. [26] is adopted to initialise n GS , ω e , μ e and Σ e by using the historical data of x cv .
Then the steps of truncating the GM model are given: Step where c gs (x cv ;Σ e ) is the Gaussian copula density function, whose expression is given in ref. [27]. σ e,o in Equation (14) can be derived by where (⋅) • ( 1/2) is Hadamard square root. diag(Σ e ) is a diagonal matrix whose diagonal elements are the same as those of Σ e .
Step Step 3: Formulate the truncated IS PDF by Equation (17).
where g tr (x cv ) is the truncated version of g(x cv ).

Sampling of the truncated Gaussian mixture model
The process of sampling the truncated GM model is designed as a component-wise process. If m samples are needed, draw mω e samples from the e-th component for e = 1,…,n GS . After all components are sampled, the resulting m samples follow the truncated GM model.
The procedures of sampling the e-th component is: Step 1: Form a mω e ×n matrix X e , where the o-th column is mω e samples drawn from φ tr (x cv,o ;μ e,o ,σ e,o ). Then construct a permutation matrix L e , whose elements in the columns are the ranks of elements in the columns of X e .
Step 2: Compute CCM of L e and denote it by ρ ori , which should be a n×n matrix.
Step 3: Use Cholesky factorisation to decompose ρ ori as where D is a lower triangular matrix.
Step 4: Extract CCM from Σ e as where ρ is the CCM of the e-th component.
Decompose ρ by Cholesky factorisation as where G is a lower triangular matrix.
Step 5: L e is transformed as where the CCM of L e,new becomes ρ.
Step 6: Permute X e according to the permutation matrix of L e,new .

Searching for importance sampling parameters of the Gaussian mixture model
According to the optimisation process stated in ref. [28], the estimates of ω e , μ e and Σ e can be derived by solving an optimisation problem in Equation (22).
Equation (22) can be solved by the expectation-maximisation algorithm adopted in ref. [28]. The solution in the l-th iteration is given as Note that: 1) Stopping criterion: If the parameters in l-th iteration are close enough to their counterparts in (l-1)-th iteration. Stop the iteration. 2) Adjust the number of components: If ω e ( l) becomes lower than 1 m −1 delete this component.

USING TRUNCATED GAUSSIAN MIXTURE MODEL BASED CROSS ENTROPHY BASED IMPORTANCE SAMPLING TO ASSESS THE RELIABILITY OF POWER SYSTEMS WITH WIND FARMS
In Section 4, the continuous variables are of much concern. However, both continuous and discrete variables exist in the power system. As stated at the beginning of Section 4, TGCE-IS addresses the discrete variables in the same way as CE-IS. Then the whole process of TGCE-IS in power system reliability assessment is elaborated below.
Firstly, x is instantiated as where x L is the vector of the states of all transmission lines. x C is the vector of the states of all conventional units. x W is the vector of the maximum available power output of all wind farms. x LD is the vector of the load at all buses. Then TGCE-IS is divided into two stages similarly with Figure 2.
First stage: Find the IS PDFs of all variables. Second stage: Generate samples according to the IS PDFs to estimate PLC.

5.1
First stage: Obtain importance sampling probability distribution functions

Construction of original probability distribution functions
Use the method in ref. [14] to construct the original PDFs for x.
The original PDF of x W and x LD is where x LW = x W ∪x LD . c dv (x LW ) is the D-vine copula density function. n W is the number of wind farms. n LD is the number of load buses.
The original PDF of transmission lines is where θ L,i is the failure rate of the i-th transmission line. x L,i can be either 0 or 1, indicating the failure or operation state of the i-th transmission line, respectively. n L is the number of transmission lines. The original PDF of conventional units is where θ C,j is the failure rate of the j-th conventional unit. x C,j can be either 0 or 1, indicating the failure or operation state of the j-th conventional unit, respectively. n C is the number of conventional units.

Definition of indicator function I(x)
According to Equation (3), to define I(x) needs to judge whether loss of load occurs under the state of x. This judgement can be fulfilled using two OPF models, that is, minimum load shedding model and maximum load increase model. Firstly, the expressions of the two OPF models are given below. The minimum load shedding model is [14] min P C ,P LD,sh ,P W where P LD,sh is the vector of the amount of load cut off at all buses. P C is the vector of power output of all conventional units. P W is the vector of the power output of all wind farms. P C,min and P C,max are the minimum and maximum power output of conventional units. P L is the vector of power flow in all transmission lines. P L,min and P L,max are the minimum and maximum transmitted power in all the transmission lines. SF L,C , SF L,W and SF L,LD are the power transfer distribution factor matrices, which are used to determine the value of P L with respect to a given set of P C , P W and x LD [29]. The symbol • represents Hadamard product.
The maximum load increase model is [24] max ,P C ,P W s.t.
where δ is a scaling factor of load. All other variables hold the same meaning as their counterparts in Equation (31) Then define a performance function S(x) based on the two OPF models as Finally, I(x) is defined based on S(x) as where φ is a threshold, which is fixed to 0 in the second stage, but changes adaptively in the first stage. The reason will be given later on.

Constructing and searching for importance sampling probability distribution functions
The IS PDF of x W and x LD is constructed by using Equations (16) and (17), and denoted as g tr (x LW ).

(35)
The IS PDFs of x L and x C are constructed in the same form as Equations (29) and (30), which are where g(x L ) and g(x C ) are the IS PDFs of x L and x C . η L,i and η C,j are the adjusted failure rates of the i-th transmission line and the j-th conventional unit respectively. Searching for the IS PDFs is the optimisation of the parameters of the IS PDFs. As shown in Section 4.3, the optimisation of parameters of continuous variables is an iterative process. Iteration is also needed in searching for the parameters of discrete variables.
The procedures are given below.
Step 1: Use g(x LW ), g(x L ) and g(x C ) to generate m samples of x. In the first iteration, use the original PDFs instead.
If φ is directly fixed as 0, there may be few "I(x) = 1″ samples in the first few iterations, because the system fault is rather rare. Lack of "I(x) = 1″ samples leads to strong fluctuation in the values of IS parameters, which hinders the convergence of TGCE-IS. If φ is adaptively changed and α is set to 0.05, at least 5% of the m samples satisfy "I(x) = 1″ in every round, which solves the convergence problem.
where Q(x) is specified as Step 5: If φ≥0, output the IS PDFs and go to the second stage. Otherwise, return to Step 1.

Second stage: Compute reliability indices
The IS PDFs obtained in the first stage are used to draw samples of x, and Equation (6) is performed to estimate PLC. Besides, as mentioned in Section 3.1, EDLC, EDNS and EENS can be similarly estimated using the IS PDFs, which are shown below.
where γ EDLC,IS , γ EDNS,IS and γ EENS,IS are the estimates of EDLC, EDNS and EENS, whose units are hr/yr, MW and MW/yr, respectively. Then the flowchart of TGCE-IS applied to the power system reliability assessment is given in Figure 4.

Basic settings
The relative error limit of PLC estimated in CC evaluation is 2% for all the following cases. The sample size m in the first stage of TGCE-IS is set to 30,000. All numerical tests are coded with MATLAB 2016a and run on an Intel Core i7-5500U personal computer with 8 GB RAM.

Input data
The input data of the CC evaluation are: 1) The topology of the power system; 2) The historical load profile of the power system; 3) The failure rates of the transmission lines and the conventional units; 4) The historical power output data of wind farms.
These input data are mainly used in the OPF analysis and the construction of the original PDFs of x LW , x L and x C .
The sources of these input data are given below. Input 1)-3): The IEEE RTS 79 and IEEE RTS-96 test systems are used to design numerical tests. All the necessary information about Input 1)-3) can be found in refs. [30,31]. Input 4): Three wind farms are considered in the numerical tests, named WF1, WF2 and WF3, respectively. Each of them has an installed capacity of 150 MW. The historical power output data of the three wind farms are obtained from actual wind farms in Northwest China, and given in ref. [32]. The probability histogram of the power output of WF1-WF3 are displayed in Figure 5 for better visual effect.

Design of test systems
Two modified test systems (MTS) are defined based on IEEE RTS 79 and IEEE RTS-96 systems for different purposes and used in the following cases. MTS_1: WF1-WF3 are integrated into the IEEE-RTS 79 system and connected to Bus 2, 13 and 21 in order.
MTS_2: WF1-WF3 are integrated into the IEEE-RTS 96 system and connected to Bus 102, 213 and 321 in order.
The CCM of the load and the power output of WF1-WF3 in MTS_1 and MTS_2 is denoted by where ρ LW is correlation coefficient between wind farm output and load. ρ W is correlation coefficient between wind farms themselves. By changing the value of ρ LW and ρ W , we can specify different CCM.

Correlation controlling of historical data
In the following cases, the CCM of wind farm output and load needs to be changed to analyse its impact on the CC of wind farms. However, the original PDF in Equation (28) is constructed by using the historical data of load and wind power output, where the CCM cannot be changed to a specified one directly. Hence, it is necessary to adjust the historical data to make them hold the specified CCM, while keeping the marginal PDFs unchanged.
The correlation controlling process is [33]: Step 1: The historical data can be denoted by a N×(n LD +n W ) matrix X LW , where N is the number of data points. Then construct a matrix L LW , whose elements in each column are the ranks of the elements in the corresponding column of X LW .
Step 2: Compute the CCM of the historical data and denote it by ρ ori , which is a (n LD +n W )×(n LD +n W ) matrix.
Step 3: Decompose ρ ori by Cholesky factorisation as where D is a lower triangular matrix.
Step 4: Decompose the specified CCM by Cholesky factorisation as.
where ρ is the specified CCM. G is a lower triangular matrix.
Step 5: Infuse the specified CCM into L LW by where L LW,new is the adjusted version of L LW , whose CCM is ρ.
Step 6: Permute X LW according to the permutation matrix of L LW,new , so that X LW becomes the historical data with the specified CCM ρ.

Validation of truncated Gaussian mixture model based cross entropy based importance sampling
To validate the correctness and efficiency of TGCE-IS in solving the problems of CE-IS, we design two cases based on MTS_1. In these two cases, only reliability indices are computed and compared, while the CC evaluation of the three wind farms is performed later on. Besides, we adopt three CE-IS methods in refs. [20,21,24] to compare with TGCE-IS, namely CE-IS_1, CE-IS_2 and CE-IS_3.

Case 1-Wind farms and load being independent
The values of ρ LW and ρ W in Equation (45) are both set to 0.
PLC and EDNS are computed by all the methods and shown in Table 1. The convergence curves of relative error of PLC for the three CE-IS methods and TGCE-IS in Case 1 are plotted in Figure 6 for better visual effect.
The results in Table 1 and Figure 6 reflect that: 1) PLC and EDNS computed by TGCE-IS are close to those computed by MCS and CE-IS, which proves the correctness of TGCE-IS. 2) All the CE-IS methods and TGCE-IS are far more efficient than MCS. 3) When the wind power output and load are independent, the efficiencies of the three CE-IS methods are almost the same, while TGCE-IS is about two times faster than the three. This is because TGCE-IS adopts the non-parametric GM model as the IS PDF of wind power output and load, which solves the first limitation of CE-IS stated in Section 3.3.

Case 2-Wind farms and load being correlated
The values of ρ LW and ρ W in Equation (45) are both set to 0.8.
PLC and EDNS are computed by all the methods and shown in Table 2.   Tables 1 and 2, it can be observed that: 1) The efficiency of TGCE-IS is almost impervious to the correlation between load and wind power. 2) The sample sizes of the three CE-IS methods become much larger when load and wind power are highly correlated. 3) TGCE-IS becomes more superior to the three CE-IS methods in this case than in Case 1. This is because it addresses the correlation of the wind power output and load better than the traditional CE-IS methods, which solves the second limitation of CE-IS stated in Section 3.3.
The convergence curves of relative error of PLC for three methods in Case 2 are plotted in Figure 7 for better visual effect. TGCE-IS converges obviously faster than all the three CE-IS methods.

Investigation of influencing factors of capacity credit
In this section, the CC of wind farms is computed by using PLC as the reliability index, which is estimated by TGCE-IS. Correlation between load and power output of wind farms MTS_1 is used in this case. To investigate the impact of correlation between load and wind power output on CC, ρ W is fixed to 0. To ensure ρ in Equation (45) to be semi-definite, ρ LW is set to −0.5, 0 and 0.5, respectively.
In this case, CC values of WF1-WF3 are shown in Figure 8. According to Figure 8, the CC values of wind farms are strongly affected by ρ LW . Positive correlation between wind power and load increases the CC of wind farms significantly.

6.3.2
Share of wind energy in generating capacity MTS_1 is used in this case. To investigate the impact of share of wind energy in generating capacity on CC, ρ W is fixed to 0 and ρ LW is fixed to 0.5.
WF1-WF3 is regarded as a whole to compute their CC. To adjust the share of wind energy in generating capacity, the total installed capacity of WF1-WF3 is changed from 450 MW to 900 MW.
In this case, CC of the total wind energy of WF1-WF3 is shown in Figure 9.
According to Figure 9, the CC of WF1-WF3 presents a declining trend with the increasing share of wind energy in generating capacity.

Power system adequacy level
MTS_1 is used in this case. To investigate the impact of power system adequacy level on CC, ρ W is fixed to 0 and ρ LW is fixed to 0.5. The total installed capacity of WF1-WF3 is fixed to 450 MW. To adjust the adequacy of the power system, the peak load is changed from 2850 MW to 3400 MW. WF1-WF3 is still regarded as a whole to compute CC.
In this case, the CC of WF1-WF3 is shown in Figure 10. According to Figure 10, the CC of WF1-WF3 presents a declining trend with the increase of peak load. That means in a power system with weaker reliability, the CC of wind energy is correspondingly smaller.

6.4
Application of truncated Gaussian mixture model based cross entropy based importance sampling based capacity credit evaluation method in large-scale power systems  Tables 3 and 4.  According to Tables 3 and 4, it is concluded that: 1) In a larger power system, the sample sizes of all the methods become larger accordingly; 2) The sample sizes of CE-IS methods become larger when the power output of wind farms is correlated with the load, because the correlation declines the efficiency; 3) The sample size of TGCE-IS is scarcely affected by the correlation between the load and wind power output. Hence, it is proved that TGCE-IS is much more efficient than traditional CE-IS even in a much larger and more reliable power system.
Finally the CC of the three wind farms in Case 3 and Case 4 are computed and given in Table 5.
The CC values of WF1-WF3 in Table 5 are nearly the same as those in Figure 8 under the same ρ LW .
In Section 6.3.3, it has been observed that the CC of wind farms decreases when the power system reliability declines. However, according to Table 5 and Figure 8, it is false to conclude that the more reliable the power system is, the larger CC the wind farms have. There is definitely an upper limit of CC for a given profile of wind energy source.

DISCUSSION OF RESULTS
Based on the numerical tests, it can be found that: 1) According to Tables 1 and 2, Figures 6 and 7, TGCE-IS is more efficient than traditional CE-IS methods regardless of the correlation of wind power output and load. Besides, the higher the correlation of the wind power output and the load is, the faster TGCE-IS is than the traditional CE-IS methods in the composite power system reliability assessment. 2) According to Figure 8, the CC of wind farms has a positive relationship with the correlation between the wind power output and load. 3) According to Figure 9, the CC of the wind farms declines with the increasing share of wind energy in the generating capacity. 4) According to Figure 10, the CC of the wind farms declines with the decline of the power system reliability. 5) According to Tables 3 and 4, the proposed TGCE-IS is still effective in large power systems. Besides, by comparing Table 5 and Figure 8, it is found that there is an upper limit of CC for a given profile of wind energy source.

CONCLUSION
When the wind power output and the load are highly correlated, the traditional CE-IS methods used in the CC evaluation of wind farms have two problems: 1) Choosing a parametric form for IS PDFs in advance limits the efficiency of CE-IS. 2) The measures to address the correlation between variables need improvement to avoid the efficiency decline of CE-IS when some variables are highly correlated.
Therefore, a solution named TGCE-IS is developed to solve the two problems. Hence, TGCE-IS is superior to traditional CE-IS in power system reliability assessment, regardless of the correlation structure of wind power output and load.
By adopting TGCE-IS to compute reliability indices in the CC evaluation, the whole evaluation process becomes faster and steadier which facilitates the numerical tests for investigating various impact factors on CC of wind energy. Among all the factors, it is observed that the higher the correlation between the wind power output and load, the larger the CC of wind farms is. Hence, techniques should be developed to increase the correlation between wind power and load to improve the CC of wind energy, due to its rapid increase of share in power systems.