Edgeworth expansions for two-stage sampling with applications to stratified and cluster sampling

: A two-term Edgeworth expansion for the standardized version of the sample total in a two-stage sampling design is derived. In particular, for the commonly used stratiﬁed and cluster sampling schemes, formal two-term asymptotic expansions are obtained for the Studentized versions of the sample total. These results are applied in conjunction with the bootstrap to construct more accurate conﬁdence intervals for the unknown population total in such sampling schemes. The Canadian Journal of Statistics 43: 578–599; 2015 des intervalles de conﬁance plus pr´ecis pour le total de la population dans le contexte de plans d’´echantillonnage. revue de statistique 43: 578–599;


INTRODUCTION
A common sampling strategy for surveying finite populations is to select the sampled units in several stages. Multistage sampling refers to sampling plans, where the selection of units is carried out in stages using smaller and smaller subunits at each stage. For instance, to do a national survey on unemployment or to conduct an opinion poll on a given topic, one may select certain states, then certain counties within those states, cities within these counties, etc. in order to draw the final sampling units. In particular, in a two-stage sampling design, the population is divided into several "primary" units from which a sample of primary units is selected, and then a sample of secondary units is selected within each selected primary unit. In each stage we use simple random sampling without replacement (SRSWOR), where a sample is taken without replacement and any sample of a given size has the same probability of being chosen. It should be noted that two of the most popular sampling designs-stratified sampling and cluster sampling-can be viewed as special cases of such two-stage sampling. A stratified random sample is a complete census of the primary units (the strata) followed by a sample of the secondary units within each primary unit. A cluster sample, on the other hand, is a sample of the primary units (the clusters) followed by a complete census of the secondary units within each selected primary unit. This allows considerable flexibility depending on the homogeneity of units at the primary or secondary level, as well as cost considerations.
Our aim in this paper is to study the asymptotic normality and develop Edgeworth expansions for the cumulative distribution function of the estimator of the population total (or the mean) in two-stage sampling, and then specialize these results to the case of stratified and cluster samplings. Such results, although very important and useful, become rather non-trivial because of the complex probability distribution on the selected subset of units induced by the sampling design. Large sample properties and statistical inferences for estimators in the context of finite populations are considerably more involved than in the independently and identically distributed (i.i.d) case because they depend not only on the characteristics of the finite population but also the sampling design employed. For instance, choice of units in stratified simple random sampling without replacement would correspond to a multivariate hypergeometric distribution, which can be viewed as independent binomials (with the same probability of selection) conditional on their sum. Such sampling schemes may be viewed as draws in "generalized urn models" and the resulting estimate of the population total as a sum of functions of resulting frequencies in such a context. This enables us to exploit some of the machinery developed in Mirakhmedov, Jammalamadaka, & Ibrahim (2014), as we do in Section 2.
The main objective is to make inferences on the parameters of the finite population using a sample selected from the finite population according to a specified probability sampling design. Even in simple situations, the exact distribution of the relevant estimators can be too complex to be determined analytically, and large sample theory and approximations provide a useful alternative for making such inferences. In this paper we shall consider two-stage designs, where it is assumed that population size as well as the sample sizes in each stage are sufficiently large.
One of the most commonly estimated finite population parameters is the "population total," denoted by Y or the corresponding population mean. The estimator used for this purpose, sayŶ (see Section 2), can be approximated by a Gaussian distribution under fairly general conditions, which also allows us to set confidence intervals for large samples as is usually done.
One of our primary goals in this paper is to obtain better approximations for this large sample distribution by studying the Edgeworth asymptotic expansion for this estimator. Not only are such analytical expressions of interest in their own right, but they also allow us to provide more accurate confidence intervals for the unknown population total (or mean), as we show. Under the singlestage SRSWOR design, asymptotic results for this estimator, which coincide with corresponding results on the sample mean from a finite population of a real numbers, are well studied in the literature: see, e.g., Erdös & Rényi (1959), Hájek (1960), Scott & Wu (1981), Robinson (1978), Bickel & van Zwet (1978), Sugden & Smith (1997), and Bloznelis (2000). For results on stratified and cluster sampling, see Rao (1973), Cohran (1977, Sen (1988) and Schenker & Welsh (1988), Krewski & Rao (1981), and Bickel & Freedman (1984), whereas Hájek (1964) and Prášková (1984) discuss results for unequal probability sampling. As we show in Section 2 the estimator under a two-stage scheme can be reduced to a weighted sample mean from a finite population of "random" variables. Asymptotic normality and Edgeworth asymptotic expansion for this sample mean have been considered by von Bahr (1972), Mirakhmedov (1983, Hu, Robinson, & Wang (2007), Mirakhmedov, Jammalamadaka, & Ibrahim (2014), and Ibrahim & Mirakhmedov (2013).
Our second aim is to focus specifically on the stratified and cluster sampling designs. We first apply the general Edgeworth asymptotic expansion result by writing down the first two terms in the expansion for standardized versions of the estimators. However their practical use is limited, as the standard deviation of these statistics is not known in practice. We hence derive formally a twoterm asymptotic expansion for the Studentized version of these estimators; the result for stratified sampling design extends corresponding results by Babu & Singh (1985) and Sugden & Smith (1997) who considered a sample mean in the one-stage SRSWOR design. Finally, our third aim is to apply the theoretical results obtained here to construct skewness-adjusted confidence intervals for the unknown population total. Extensive Monte Carlo studies indicate that the additional terms in the asymptotic expansion do indeed provide better results than the usual normal approximation based confidence intervals, especially when combined with bootstrap methods.
A note about the notations. φ(x) and (x) will denote the density and distribution functions of the standard normal distribution, respectively. c and C will denote some absolute finite constants and θ will denote a number whose absolute value does not exceed 1. It should be observed that c, C, and θ may be different when used in different equalities and inequalities or in different parts of the same equality or inequality. The main results are to be found in Theorems 1 and 2, as well as in Corollary 1, given in Sections 2 and 3. The long proofs of these two theorems are given in the Appendix. Construction of confidence intervals is discussed in Section 4, which contains two short Monte Carlo simulation studies as well.

ESTIMATORS IN TWO-STAGE SAMPLING
Suppose the population consists of N primary units, with the jth primary unit consisting of M j secondary units. The value of the kth secondary unit within the jth primary unit is denoted by y jk . Thus for the jth primary unit, the total and the mean are given by The quantity of interest that is to be estimated is the "population total," and we denote the mean per primary unit byȲ = Y/N. We draw a simple random sample without replacement at each stage: a simple random sample s of n primary units out of N, and a simple random sample s j of m j secondary units out of M j within each primary unit j ∈ s. Let represent the sampling fractions in these two stages, with the notation Denote the sample mean in the selected jth primary unit byȳ j , and let y j = m jȳj = k∈s j y jk be the corresponding sample total. The unbiased estimator of Y j is given bŷ and the unbiased estimator of the population total Y iŝ the subscript "ts" denoting two-stage, to distinguish it from other estimators coming later. We are interested in the Edgeworth expansion forŶ ts . In order to derive the expansion, we will use a somewhat different interpretation of the estimatorŶ ts . This will be done by reversing the two stages in the sampling plan. Now, in the first stage, a simple random sample s j of m j secondary units is drawn within each primary unit. As before we form the estimatorsŶ j , but this time for j = 1, ..., N (rather than for j ∈ s). In the second stage, a simple random sample s of size n is drawn from the "population" consisting of the "units"Ŷ 1 , ...,Ŷ N , and thereafter we form the estimatorŶ It is clear thatŶ ts andŶ ts are equal in distribution, and that f 1Ŷ ts is a sample sum from a finite population of independent random variables. Let and and note that σ 2 ts is the asymptotic variance of n −1/2Ŷ ts . We wish to derive a suitable upper bound for by utilizing Theorem 2 of Hu, Robinson, & Wang (2007). For doing this, we will need a technical condition, which ensures that the values y jk in each secondary unit do not cluster around too few values (cf. Robinson (1978), Bickel & van Zwet (1978), and Mirakhmedov (1983), where similar conditions are used). This technical condition, Condition C, is given in the Appendix. Let

and
= n 1/2 log n exp where and δ are positive constants defined in Condition C, and c( , δ) is a positive constant depending only on and δ. We then have the following main result of this section whose proof is given in the Appendix.

Theorem 1. If Condition C is fulfilled, then there exists a positive constant c such that
where Y is the population total, and f 1 ,Ŷ ts , σ 2 ts , ts (u), ρ, and are defined in (1), (3), (5), (6), (8), and (9), respectively. Remarks 1. The quantity = (m j , M j , n, N) is exponentially decreasing when m j g 2j is increasing; the latter being a necessary condition forŶ j to be asymptotically normal. In addition, = (m j , M j , n, N) is exponentially small if m j g 2j < 1/8.

Remarks 2.
The quantities α 20 , α 21 , and α 30 appearing in the definition of ts (u) can be computed through the following formulas: and where using (10), σ 2 ts given in (5) can be rewritten as Remarks 3. To limit complexity, we have restricted ourselves to a two-term expansion in Theorem 1. However higher order terms can be obtained similar to the results in Ibrahim & Mirakhmedov (2013).

STUDENTIZED ESTIMATORS IN STRATIFIED AND CLUSTER SAMPLINGS
We now consider two important special cases, namely: (i) stratified random sampling and (ii) cluster sampling. We continue to use the notations of Section 2.
The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs (i) Stratified random sampling is a special case of two-stage sampling where n = N. Hence for this case, and ρ is defined as in (8). (ii) Cluster sampling is a special case of two-stage sampling with m j = M j , j = 1, ..., N. Hence for this case, 6(ng 1 ) 1/2 α 3/2 02 , and ρ = α 04 .
Although the foregoing Edgeworth expansions obtained from Theorem 1 provide better approximations than the Central Limit Theorem for the standardized statistics, they are not useful in practice as the variances of the statistics under consideration are not known. Due to this fact we shall now obtain formally a two-term expansion for the Studentized versions of our statistics, both for the stratified and cluster sampling designs.
For the population total, Y , the estimator used in stratified random sampling iŝ Let m = N j=1 m j . The variance ofŶ str is which is unbiasedly estimated by We now define the Studentized version of the estimatorŶ str as and seek a two-term Edgeworth expansion, The statisticŶ str , defined in (13), can be viewed as a sum of independent but non-identically distributed random variablesŶ j = y j /f 2j , j = 1, ..., N, where y j is the sample sum in stratum j. Hence the Studentized version T str ofŶ str is a special case of Student's t-statistic based on independent but non-identically distributed random variables. Many authors have studied the problem of approximating the distribution of t-statistics by a normal distribution, see Hall & Wang (2004) and the references therein. In particular, Bentkus, Bloznelis, & Götze (1996) gave a Berry-Esseen bound for Student's t-statistic in the case of non-identically distributed data. From their Theorem 1.1, together with inequality (A4) in the Appendix of the current paper, we have where ρ is defined as in (8). The right-hand side of (16) is of order O(N −1/2 ) under some additional restrictions, e.g., under the following condition: Condition A: f 2j and mN −1 σ 2 str are bounded away from zero, whereas ν 4j and N −1 N j=1 m j g 2j are bounded away from infinity.
To the best of our knowledge, existing results on Edgeworth expansions for Student's tstatistic consider the independent and identically distributed case only; we refer to Hall (1987) and Hall & Wang (2004), who have established Edgeworth expansion results in this case under different refinements. From their result, it follows that, if Cramer's continuity condition lim sup |t|→∞ |E[e itX ]| < 1 holds where X is the observed random variable, then, under some moment conditions, the remainder term of the two-term Edgeworth expansion is O(N −1 ). Extensions of such results to non-identically distributed data is a difficult probabilistic problem, which we do not consider here. Instead we obtain the second term of the Edgeworth expansion in (15), see Theorem 2, using a procedure somewhat similar to that outlined in Sugden & Smith (1997, Section 2). The details are given in the Appendix. (15), we have
The above theorem is sufficient to establish (15) formally. In addition, we conjecture that Equation (15), with respect to the remainder term, holds true under Conditions A and C, because in this case Cramer's continuity condition, lim sup |t|→∞ |E[e itŶ j ]| < 1, follows from inequality (A9) in the Appendix.
The case when N = 1 with just one primary unit corresponds to the simple random sampling design considered by Sugden & Smith(1997), and their two-term expansion result follows from our theorem above. For this case, we have The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs which is the Studentized version of the estimatorŶ 1 of the population total Y 1 in simple random sampling from a population of size M 1 , and the two-term Edgeworth expansion for As yet another special case of two-stage sampling, we now consider the case of cluster sampling which corresponds to taking m j = M j for j = 1, ..., N. For the population total, Y , the estimator used in cluster sampling isŶ For this case, we define the Studentized version as and seek a two-term Edgeworth expansion, By noting that cluster sampling can be viewed as simple random sampling from a "population" consisting of the "units," i.e., cluster totals, Y 1 , . . . , Y N , and by (18), we obtain the following result. (19), we have 1 n 1/2 ψ n (u) = 1 6(ng 1 ) 1/2 α 03 α 3/2 02
Remarks 4. By using results from Babu & Singh (1985, p. 265) The estimatorsα 0r andν rj are √ n-and √ m j -consistent, respectively, i.e., for every > 0 there exists a c > 0 such that (21) holds true is a direct consequence of Chebyshev's inequality together with the inequality var(α 0r ) ≤ cμ 2r /n, r = 2, 3. For r = 2, the latter inequality follows from an application of Theorem 1 in Cho, Cho, & Eltinge (2005) and the fact that μ l ≤ μ l/k k , 1 ≤ l ≤ k, and for r = 3, from routine computations like those in the proof of Theorem 1 in Cho, Cho, & Eltinge (2005). The same reasoning shows that (22) holds true when r = 2, 3. Fletcher & Webster (1996) note that clumping in the spatial distribution of animal and plant species can lead to a high degree of skewness in the population, which can then carry over to the distribution of a stratified sample mean. For this reason, they studied several ways of calculating skewness-adjusted confidence intervals for population means from stratified random samples under the assumption that the sampling fraction in each stratum is negligible (i.e., that m j M j for all j). The set-up of our first Monte Carlo study will be similar to that in Fletcher & Webster (1996). By using the result of Theorem 2, we will show that Fletcher & Webster's conclusions can be extended to the case where the sampling fractions are not negligible. Our second Monte Carlo study uses data from the U.S. 1992 Census of Agriculture.

CONFIDENCE INTERVALS AND MONTE CARLO SIMULATIONS
As in Fletcher & Webster (1996) we will consider four different methods for computing confidence intervals; in our case, for the population total, Y . The first one is based on the usual normal approximation of the distribution of T str = m −1/2 S −1 str (Ŷ str − Y ), the Studentized version of the estimated population total,Ŷ str , in stratified random sampling. That is, this 100(1 − α)% interval is given by (Ŷ str − z α m 1/2 S str ,Ŷ str + z α m 1/2 S str ), where z α is the upper α/2 percentile of the standard normal distribution.
In order to make use of the Edgeworth expansion in Theorem 2, we will use Hall's (1992, p. 123) idea to find an invertible cubic transformation, F str = f (T str ), whose coefficients depend on the form of (17), such that the distribution of F str is closer to the standard normal distribution than is the distribution of T str (cf. Fletcher & Webster, 1996). Direct application of Hall's idea gives the following cubic transformation: , andν rj , r = 2, 3 is defined in (20). The inverse of the aforementioned transformation, needed for computing the confidence interval, is and the corresponding 100(1 − α)% interval is then given by Remarks 6. By Theorem 2 we have P{T str < u} = (u) + m −1/2 (γ 1 u 2 + γ 2 )φ(u) + r m , where r m is the remainder in (15), and γ 1 and γ 2 are defined asγ 1 andγ 2 , respectively, but withν 2j andν 3j replaced by ν 2j and ν 3j . Letm = min 1≤j≤N m j . It is easy to see thatγ 1 andγ 2 arem 1/2consistent estimators of γ 1 and γ 2 , respectively (cf. Remark 5). Following Hall (1992, pp. 122-123), . From this relation and a similar one for the caseγ 1 < 0, we deduce as in Hall (1992, p. 123) that ifγ 1 = 0, then The disadvantage of the quadratic transformation g is that it is generally not one-to-one, and this is why we use the aforementioned cubic transformation f instead. As the added term in the cubic transformation f is of order m −1 , formula (25) will not be affected if we replace transformation g with f . Thus, f (T str ) admits an Edgeworth expansion in which the second term is O((mm) −1/2 ) + r m rather than O(m −1/2 ).
The next and final two confidence intervals use the bootstrap to estimate the distributions of T str and F str , respectively. Ordinary bootstrap (applied independently within each stratum) is not recommended, as it involves with-replacement samples of size m j in stratum j and so does not mimic how the original data were sampled. One way to deal with this is to use the population bootstrap, as described in, e.g., Davison & Hinkley (1997, Section 3.7). Then, in the jth stratum and if l j = M j /m j is an integer, a fake stratum of size M j is constructed by replicating each sample observation y jk , k ∈ s j , l j times, and a bootstrap replicate of {y jk ; k ∈ s j } is generated by taking a sample of size m j without replacement from the constructed fake stratum. More generally, let l j be the integer part of M j /m j and d j = M j − l j m j . Then a fake stratum is obtained by taking l j copies of each y jk , k ∈ s j , and adding to them a sample of size d j taken without replacement from y jk , k ∈ s j . By applying the population bootstrap independently within each stratum, a bootstrap replicate of {y jk ; k ∈ s j , j = 1, ..., N} is obtained.  T str,b = m −1/2 S −1 str,b (Ŷ str,b −Ŷ str ), and a 100(1 − α)% confidence interval for the population total is given by where t L and t U are the lower and upper α/2 percentiles of the empirical distribution of {T str,b } B b=1 . Likewise, the bootstrap version of F str is defined as whereγ 1,b andγ 2,b are constructed in the same way asγ 1 andγ 2 are constructed from the original stratified sample. A 100(1 − α)% confidence interval for the population total is then obtained as where f L and f U are, respectively, the lower and upper α/2 percentiles of the empirical distribution of {F str,b } B b=1 . We henceforth will refer to the confidence intervals (23)-(27) as the NT, NF, BT, and BF interval, respectively. In the Monte Carlo study, three basic types of survey are considered for each total sample size m. Type I consists of as many strata as possible (for computing theν 3j , at least three units are needed in each stratum), type III has only two strata, and type II is chosen as an intermediate between type I and type III. For every type of survey and sample size considered, half the strata are "high density," consisting in total of 120 units, and half "low density," consisting in total of 1,080 units. The values of the population units were generated from the corresponding infinite population in Fletcher & Webster (1996), and summary statistics of our finite population are given in Table 1. Survey types and total sample sizes considered are presented in Table 2. It should be noted that the use of equal sample sizes in the strata corresponds approximately to Neyman allocation.   I  II  III  I  II  III  I  II  III  I  II  III NT 0. For the simulation results presented in Tables 3 and 4, the nominal error rate is set to 100α = 5%, and B = 1, 000. For each survey type and total sample size, a lower (upper) error rate is computed as the percentage of simulation replicates for which the lower (upper) limit of the confidence interval is above (below) the population total, Y . Ideally, these error rates should both be 2.5%; however, due to skewness of the actual sampling distribution, the actual rates may be far from the desired values. In Tables 3 and 4, each pair of lower and upper rates is based on 10,000 repeated stratified samples from the defined population.
For NT, NF, and BT, the lower rates are too low and the upper rates are too high, and the upper rates are worse than the lower ones. This holds true also for the BF method, except that it produces too high lower rates in some cases. The NT method is worse than the other methods, except that the BT method produces even worse lower rates when the stratum sizes are small. The best lower rates in the case of small and intermediate stratum sizes are given by the NF method, whereas the BF method appears to give better lower rates for large stratum sizes. With respect to the upper error rates, the BF method is uniformly better than all the other methods, and BT and NF perform about equally well, although the latter is not as good when the total sample size is large. The best overall error rates are provided by BT and BF, where the latter is to be preferred if the stratum sizes are small.
The results in Table 5 show that BT, NF, and BF produce wider intervals than NT. For example, for survey type I and total sample size m = 30, NF produces intervals 2.0 times wider than NT, on average. The NF method results in narrower intervals than the BF method. For larger strata and total sample sizes, BT produces narrower intervals than NF, but the opposite holds true when stratum or total sample sizes are small. We conclude this section by considering sampling from two real-world populations. The first real-world example uses data from the U.S. 1992 Census of Agriculture. As our variable of study, we use the number of farms with 1,000 acres or more. For this example, we use the four census  I  II  III  I  II  III  I  II  III  I  II  III   NT I  II  III  I  II  III  I  II  III  I  II  III NF 2.0 1.9 2.0 1.9 1.9 1.9 1.9 1.9 1.9 1.9 regions of the United States -Northeast, North Central, South, and West -as strata, consisting of M 1 = 220, M 2 = 1054, M 3 = 1382, and M 4 = 422 counties, respectively. The population is illustrated in Figure 1, and we see that it has a high degree of skewness (but not quite as extreme as in our previous example). We repeatedly take stratified samples from the population, with m 1 = 21, m 2 = 103, m 3 = 135, and m 4 = 41, so each sampling fraction, f 2j , is in the interval (0.090, 0.098). Again, we use α = 0.05 and B = 1, 000. The results, based on 10,000 repeated stratified samples from the population, are given in Table 6. The NT method is worse than the other methods. The overall winner is the NF method, with an overall error rate equal to 2.50 + 2.25 = 4.75, and with the BT and BF methods close behind. The NF, BT, and BF intervals are slightly wider than the NT intervals (about 1.4%, 2.6%, and 2.1% wider, on average). The second real-world population consists of the 284 municipalities of Sweden and is called the MU284 population; it can be found in Särndal, Swensson, & Wretman (1992, pp. 652-659). We use the variable CS82, which is the number of Conservative seats in a municipal council,   18 and the N = 50 clusters as defined in Särndal, Swensson, & Wretman (1992). As before, we use α = 0.05 and B = 1, 000, and NT, NF, BT, and BF intervals are defined analogously to those in the case of stratified sampling. The results, based on 10,000 repeated cluster samples of size n = 16 from the MU284 population, are given in Table 7. We see that the lower rates are too low and the upper rates are too high, except for the NF lower rate which is slightly larger than the nominal lower rate. The NF has the best lower rate, whereas the BT method has the best upper and overall rates (but the worst lower rate). The NF, BT, and BF intervals are wider than the NT intervals (about 2.0%,12.0%, and 15.2% wider, on average), but the NF interval is only slightly wider.

CONCLUDING REMARKS
In this paper we have derived a two-term Edgeworth expansion for the standardized sample total in two-stage sampling. For two very important special cases, namely stratified random sampling and cluster sampling, formal two-term Edgeworth expansions have been obtained for Studentized sample totals. We have illustrated that such results can be very useful for calculating skewnessadjusted confidence intervals for the population total. By itself, the second-order terms in the expansion for the Studentized stratified sample total can improve coverage error to an extent comparable to what is achieved by using the bootstrap, and appears to be even better than the bootstrap on the lower limit.
Further improvements, at least on the upper limit, can be achieved by using these higher order terms together with the bootstrap (adapted to the finite population setting).

APPENDIX
First we specify the Condition C, needed for ensuring the result of Theorem 1.
We derive asymptotic expansions for the cumulants of T str,m by using a procedure somewhat similar to that outlined in Sugden & Smith (1997, Section 2). Put W jk = y jk −Ȳ j ν −1/2 2j and V jk = W 2 jk − 1, and define the sample meansw j = m j −1 2jw j , we obtain the following stochastic expansion of T str,m (and its powers to an appropriate order): Note, only the terms which will contribute to a 1,2 and a 3,1 are given explicitly on the right-hand side above. Also, note thatv 1 , ...,v N andw 1 , ...,w N are two sequences of independent random variables, and thatv j andw j , j = j , are independent random variables. Below we will focus on the leading terms in the asymptotic expansions, and the notation l.t.(x) will be used to refer to the leading term of a quantity x. Recalling that E[w j ] = 0, we have and Let η jk = 1 if y jk appears in the sample s j , and 0 otherwise. Note that P(η jk = 1) = f 2j , and that the sample meansw j andv j may be written as  Finucan, Galbraith, & Stone (1974, p. 152), from which it follows that where a 1,2 = − m −1 N j=1 m j f −3 2j g 2 2j ν 3j 2 m −1 N j=1 m j f −2 2j g 2j ν 2j 3/2 , and E[T 2 str,m ] = 1 + o(1).
Consider the numerator in the first term on the right-hand side of (A15). By (A17), and by noting thatw j , j = 1, ..., N, are independent random variables, we get Next, consider the numerator in the second term on the right-hand side of (A15). We have  and from this, together with (A16), (A17), and (A18), we obtain the leading term of the second term on the right-hand side of (A15), 9 N j=1 m j f −3 2j g 2 2j ν 3j 2 N j=1 m j f −2 2j g 2j ν 2j 3/2 .