Identifying changes in the distribution of income from higher‐order moments with an application to Australia

Changes in the distribution of income over time are identified based on an adjusted two‐sample version of the Neyman smooth test by using subsampling methods to approximate the sampling distribution of the test statistic when samples are not independent of each other. A range of Monte Carlo experiments show that the approach corrects for size distortions arising from dependent samples as well as generating monotonic power functions. Applying the approach to studying the distribution of income in Australia over the business cycle and the Global Financial Crisis, the empirical results highlight the importance of higher‐order moments and demonstrate that business cycles are not all alike as the relative strengths of higher‐order moments vary over phases of the cycle.


Introduction
Identifying changes in the distribution of income over time is important for analysing movements in income inequality, poverty and welfare measures in general.Changes in the distribution of income have been studied in the literature from different aspects with special emphasis given to understanding how these changes occur over the business cycle through variations in the distributional moments.
The role of higher-order moments in income distribution models is investigated by McDonald, Sorensen & Turley (2013) using generalised beta distributions with special emphasise on modelling variations in skewness and kurtosis.Using non-parametric kernel density methods, Burkhauser et al. (1999) identify changes in skewness during the 1980s business cycles of the United States and the United Kingdom.Pittau & Zelli (2004) also use non-parametric kernel methods, but focus on Italy in the 1990s, while Pittau & Zelli (2006) investigate income distribution dynamics for the EU region.Related work is by Castañeda, Díaz-Giménez & Ríos-Rull (1998) and Bitler & Hoynes (2015) who study the business cycle dynamics of US income distribution, and Dutta, Sefton & Weale (2001) who focus on the distribution of income in the United Kingdom.Policy implications of changes in the distribution of income are reviewed by Kappes (2021).
A key assumption underlying many previous studies on testing for time variations in income distributions is that the samples at each point in time are independent.In the case where panel data are used and households are matched across years to obtain more precise estimates of changes in the income distribution, the assumption of independence is unlikely to hold given that household incomes are serially correlated over time.An exception is the approach of Vinh, Griffiths & Chotikapanich (2010) who explicitly model the dependence across samples at different points in time using either the Singh-Maddala bivariate parametric distribution or a bivariate copula.
To test for changes in the distribution of income over time, this paper proposes a modified version of the smooth test of Bera, Ghosh & Xiao (2013) and its extension by Song & Xiao (2022), based on the subsampling methods of Politis & Romano (1992) to correct for potential size distortions arising from samples from different years being dependent.The adoption of the Song & Xiao (2022) extended smooth test provides a scale adjustment of the Bera, Ghosh & Xiao (2013) statistic that corrects for size distortions when the sample sizes equal each other, as is the case from using panel data with matching households across samples.Under the null hypothesis, incomes across two time periods are assumed to come from the same distribution, whereas under the alternative hypothesis the distributions are allowed to differ.Further extensions that allow for testing the equality of several distributions are discussed by Zhou, Zheng & Zhang (2017) and more recently by Zhang, Guo & Zhou (2022).The distributions are specified non-parametrically, thereby allowing for changes in income distributions under the alternative hypothesis to arise from changes either in the parameters of the same distribution or in the functional form of the distributions.Income distributions can still be specified parametrically within the smooth class of tests; however, it would be necessary to estimate unknown parameters which would introduce additional sampling variability that could contaminate the size of test statistics, a result known as super-uniformity (Thomas & Pierce 1979).
The subsampling methods involve performing the test over rolling subsamples to construct the empirical distribution of the test statistic, which is then compared with the test statistic based on all of the data.To preserve the dependence structure in household incomes, the subsamples are paired to ensure comparisons are made in terms of the same households.Related work by Linton, Maasoumi & Whang (2005) also uses subsampling methods, although their focus is on testing for stochastic dominance of income distributions.Maasoumi & Heshmati (2000) also test for stochastic dominance among income distributions, but instead use bootstrapping methods based on a paired-bootstrap.Other related work involves non-parametric regression models (Chambers & Dhongde 2011) and quantile regressions (Doiron & Guttmann 2009).The statistical properties of the modified smooth test statistic are investigated in a range of Monte Carlo experiments, which are shown to exhibit good size and monotonic power as the sample size increases.Tests of higher-order moments within this class of tests, including skewness and kurtosis, also exhibit reasonable sizes.
The testing framework is applied to identifying changes in the distribution of income over the business cycle in Australia from 2001-02 to 2015-16 using the HILDA panel data, involving between 5000 and 8000 households per year.Investigating changes in the Australian income distribution over the business cycle is of interest as the Australian economy has been subjected to a broad range of domestic and external shocks since 2001.Domestic shocks have tended to arise from changes in government policy, such as the introduction of the Goods and Services Tax in July 2000 (Valadkhani 2005) and changes in superannuation in 2004and 2009(APRA 2007;;APH 2010).The minerals boom beginning early 2000s resulted in a two-speed domestic economy causing regional differences in incomes within Australia (Garton 2008).International shocks consist of the Global Financial Crisis (GFC) in 2008-09 originated from the United States, followed by the European debt crisis from 2010 onwards (Lim et al. 2010).
The empirical results highlight the importance of higher-order moments in characterising changes in the distribution of income over time.The results also show that business cycles are not all alike with the relative strength of moments determining income distribution movements varying across cycles.Income distribution changes in the early 2000s are due to changes in the mean, variance and skewness.For the economic downturn caused by the GFC in 2008-09 and the upswing in the cycle peaking in 2010-11, it is the changes in the higher-order moments of skewness and kurtosis that are the more dominant mechanisms.For later cycles, the distribution of income and the business cycle appear to be independent of each other.
The remainder of the paper proceeds as follows.Section 2 discusses the properties of the panel data on the distribution of income in Australia from 2001-02 to 2015-16.Details of the econometric framework based on the adjusted Neyman smooth test are reviewed in Section 3, together with a discussion of the subsampling methods used to perform inferences.The sampling properties of the test are investigated in Section 4 using a range of Monte Carlo experiments.The testing framework is applied in Section 5 with concluding comments and suggestions for future research given in Section 6.Additional derivations and simulation results are contained in the Appendices.

Properties of income distributions in Australia
To motivate the form of the income distribution dynamical tests proposed in this paper, this section summarises the statistical properties of income distributions over time in Australia.The data consist of annual real disposable incomes of Australian households for the consecutive financial years from 2001-02 to 2015-16, taken from the HILDA panel database (Summerfield et al. 2015).
Disposable income is defined as gross income minus direct taxation obligations, which is consistent with the definition of income adopted by the Australian Bureau of Statistics (ABS).Gross income is obtained by summing all regular sources of income from both private and public sources, including family tax benefits and childcare benefits, but excluding irregular sources of income (Vinh, Griffiths & Chotikapanich 2010).Households who report non-positive disposable income or negative private income are excluded (Buddelmeyer & Verick 2006).To adjust for inflation disposable income is divided by the consumer price index (CPI) obtained from the ABS, based on 1989-90 Australian dollars.
To account for differences in the size of each household, annual household real disposable income is adjusted using the OECD equivalence scale.This is achieved by defining an equivalence factor, or equalised household size, which allocates points to each person in the household with 1 point allocated to the first adult, 0.5 to each additional person of age 15 years and over, and 0.3 to each child under 15.The equivalence factor for a household is obtained by summing the points of all household members (ABS 2013), with the equalised household real disposable income calculated by dividing household real disposable income by the equalised household size.Finally, to circumvent problems with outliers, incomes in the top 0.5% of the distribution are excluded.
Summary statistics on household real equalised disposable incomes in Australia are given in Table 1 for consecutive financial years beginning 2001-02.Average incomes progressively trend upwards from $37,730 in 2001-02, reaching a peak of $46,860 at the start of the GFC in 2008-09, before falling over the next two years to $46,180.The fall in real income during this period is even more dramatic based on median incomes which decrease from $43,030 in 2008-09 to $40,390 in 2010-11.Average incomes begin to recover during the post-GFC period from 2011-12, eventually achieving comparable income levels by 2013-14 as obtained in 2008-09, before softening in the last two years of the sample.In comparison, median incomes while also peaking in 2008-09, remain relatively flat thereafter.
The post-GFC period is characterised by a widening of the difference in the highest and lowest incomes.In 2011-12, the highest and lowest incomes are respectively $193,930 and $50.This range widens in subsequent years where in 2012-13, the maximum is $201,290 compared to a minimum income of $20, and in 2013-14 the maximum income increases further to $208,030 compared to a minimum real income of $80.There is evidence of increasing positive skewness prior to the GFC, especially from 2004-05 onwards, which decreases in 2008-09 with skewness falling from 1.44 in 2007-08 to 1.19 in 2008-09.The drop in skewness is only temporary with skewness quickly returning to previous levels, reaching a peak in 2013-14.Kurtosis exhibits behaviour similar to skewness, as it increases during most of the period prior to 2008-09, then decreases in 2008-09 before returning to its previous levels.The evolution of the Australian income distribution is presented in Figure 1 from 2001-02 to 2015-16, which is based on a non-parametric kernel density estimator with a rule-of-thumb bandwidth (See Pittau & Zelli (2004), Burkhauser et al. (1999), Ezcurra & Pascual (2009) and Liu & Zou (2011) for a discussion of bandwidth choices, and Martin, Hurn & Harris (2013) for an overall discussion of non-parametric methods.There are variations over time in the peak of the income distribution, especially during the GFC period.A snapshot of these variations during this period is given in Figure 2. Between 2008-09 and 2010-11 there is a downward shift in the mode of the distribution, with the income distribution becoming more positively skewed, a result which is reflected by the skewness statistics reported in Table 1 for these years.
The last column of Table 1 presents the correlation of income between two adjacent financial years.The asterisk denotes significant correlation at the 1% level.It is evident that there exhibits a strong correlation in the income across adjacent years, and the strength of correlation varies over time.We use the cross-sectional dependence test of Pesaran (2004) to measure the overall level of dependence for an unbalanced panel.Although in this case, because the main concern is the time-series dependence in the distribution of income, we exchange the cross-sectional and time-series dimensions in constructing the test statistic.The value of the test statistic is 11.15, which far exceeds any reasonable critical values for a standard normal distribution under the null of no dependence.

Econometric methodology
This section provides the details of the econometric framework used to test for changes in the distribution of income over time based on panel data.The approach is a modified version of the Bera, Ghosh & Xiao (2013) two-sample smooth test and the extension by Song & Xiao (2022), by using the subsampling techniques of Politis & Romano (1992) to correct for potential size distortions arising from incomes in different years of the panel being dependent.To establish notation, we first review the construction of the two-sample version of the smooth test before discussing the proposed inferential methods based upon subsampling to correct for size distortions.

Setup
Let household incomes at two different points in time be {X i } N i =1 with sample size N and cumulative distribution function (CDF) F (x ), and {Y i } M i =1 with sample size M and CDF G(y).In the case of the panel data discussed in Section 2 where households are matched across the samples, N = M .To test for a change in the CDF the respective null and alternative hypotheses are and Under the null hypothesis, the two samples come from the same distribution.
To test the null hypothesis, the distribution function F is evaluated at Y by defining the random variable Z = F (Y ).An important property of Z is that if the null hypothesis of no change in the distribution is true, then it has a uniform distribution, Z ∼ U (0, 1).To capture deviations from uniformity under the alternative hypothesis changes in the distribution are captured by defining the generalised uniform distribution The term c(θ) = 1/ 1 0 exp K j =1 θ j π j (z ) dz is the constant of integration, the integer K is the order of the approximation and π j (z ) are the standardised Legendre orthonormal polynomials, with the first four polynomials given by Given the form of the generalised uniform distribution in (2), the null and alternative hypotheses in (1) are re-expressed in terms of the unknown parameters θ j , as H 1 : at least one of the restrictions above does not hold. (4) Based on the distributional characteristics of the income data discussed in Section 2, we focus on the first four moments by setting K = 4, and hence h(z ) captures the main distributional changes in incomes arising from changes to its centre and variability, as well as from changes to its skewness and the fatness in the tails of the distribution.A property of Legendre polynomials is that each polynomial π j (z ) in (3) captures the effect of specific moments on the distribution of income, with π 1 (z ) capturing the mean, π 2 (z ) the variance, π 3 (z ) skewness and π 4 (z ) kurtosis.

Test statistic
A Lagrange multiplier (LM ) testing framework is adopted to test the strength of the changes in the distribution for each moment in (2), as well as jointly for changes in all four moments.This has the advantage that the LM test statistic is relatively easy to implement as it avoids the need to estimate the parameters θ i under the null hypothesis, which would be the case for either a likelihood ratio or a Wald test.
Letting θ 0 represent the parameter vector under H 0 given in (4), the LM statistic is defined as where M is the sample size of Y i , G(θ 0 ) is the gradient vector of the log-likelihood and I (θ 0 ) is the information matrix, both of which are evaluated under the null hypothesis at θ = θ 0 .Based on the null hypothesis in (4), LM is given by (see Appendix A for detailed derivations) where represents the sample mean of the j th Legendre polynomial evaluated at Z i .
As F (Y ) is unknown, a non-parametric estimator is used to evaluate Z i in (6) based on the empirical distribution function where 6) with Z i based on (7) to have an asymptotic χ 2 K under the null hypothesis, the sample size M needs to be sufficiently smaller than N to control for the error in estimating F .Bera, Ghosh & Xiao (2013) show that the condition is (log log N )M /N → 0 as M , N → ∞.As panel data are used in comparing the incomes of households at different points in time, the samples M and N are equal, resulting in the test statistic in (6) being oversized.To correct for the size distortion an augmented version of ( 6) is used instead based on the approach of Song & Xiao (2022) given by For the special case of N = M , the scale factor MN /(M + N ) reduces to M /2 = N /2.Individual tests of each moment are based on the statistics with LM 1 representing a test of the mean, LM 2 a test of the variance, LM 3 a test of skewness and LM 4 a test of kurtosis.

Inferential methods
For the asymptotic distribution of the LM statistic in (8) to have the usual χ 2 distribution with K degrees of freedom, the incomes in the two samples need to be independent.This assumption is also invoked in the statistic proposed by Zhou, Zheng & Zhang (2017).For data taken from a panel study where incomes of the same household units are compared at different points in time, the independence assumption is clearly violated given incomes of households are highly serially correlated over time (Vinh, Griffiths & Chotikapanich 2010).For applications that are not based on panel data, but instead use income from different regions or countries, the assumption of independence between samples becomes more tenable.
The approach adopted here is to use a simulation procedure based on subsampling methods to correct for size distortions (Politis, Romano & Wolf 1999).Subsampling methods are also used by Linton, Maasoumi & Whang (2005) in stochastic dominance tests of income distributions, as well as by Martin, Tremayne & Jung (2014) and Dungey et al. (2020) in estimating and testing dynamic count models.Subsampling is based on approximating sampling distributions by recomputing statistics over subsamples of the data.The approach is applicable to a broad range of data generating processes including dependent data as is the case with panel data on incomes of matched households over time.
The approach consists of choosing a block size B = O(M ) such that B → ∞ as M → ∞, provided that B /M → 0, and constructing rolling subsamples of the two income distribution samples, with the maximum number of subsamples being M − B + 1.
There exist a number of ways for choosing the block size B .A potential range of B is log log M to M /(log log M ), although this range is very wide.Politis, Romano & Wolf (1999) discuss a number of alternative methods for choosing the block size, while Linton, Maasoumi & Whang (2005) propose some additional heuristic methods.In the simulations and empirical investigation, the block size is chosen as B = [αM γ ] with α = 5 and γ = 0.5, where [• ] is the rounding operator.However, the simulation results in Section 4 are found to be robust to a range of α and γ values.
As it is incomes of households that are being compared in the construction of the probability integral transform, the subsamples are paired to preserve their dependence structure.For each paired subsample, ( 7) is computed and a LM test of uniformity based on either (8) or ( 9) is constructed.This step is repeated for each subsample resulting in M − B + 1 values of the LM statistic, which are used to construct a critical value based on a chosen p-value.The LM statistic is computed for the full sample and compared with the simulated critical value, with the null of no change in the income distribution between the two points in time rejected when the LM statistic based on the full sample period exceeds the critical value.
An alternative approach is to use bootstrapping methods to construct a test of equality among distributions.Arguments in favour of subsampling over bootstrap methods are discussed in Linton, Maasoumi & Whang (2005, p. 744).For a bootstrap test to generate the correct size, it is necessary to impose the null distribution of distributional equality on the bootstrap samples.In addition, to preserve the time series characteristics of household incomes a paired bootstrap can be implemented.In related work Hurn, Martin & Xu (2022) use bootstrapping methods based on a paired bootstrap to correct for potential size distortions arising from the 'super-uniformity' condition as defined in footnote 3 when F (Y ) is chosen parametrically and the unknown parameters need to be estimated.An alternative approach would to be to use subsampling to correct for size distortions arising from estimating parameters.

Sampling properties
The sampling properties of the LM test statistics in ( 8) and ( 9) with inferences based on subsampling are investigated.The main focus of the experiments is on the special case of equal sample sizes in order to capture the features of panel data whereby incomes of the same household are compared across year.Simulation results based on dependent and independent samples are presented, as well as for the statistic of Song & Xiao (2022) where inferences are not based on subsampling methods.Additional simulation results are presented in Appendices B and C.

Simulation design
The simulation results of the LM test based on subsampling methods are reported for the case of equal sample sizes given by N = M ∈ {500, 1000, 2000, 4000, 8000}.The maximum sample size of N = 8000 is chosen based on the maximum sample size of household incomes presented in Section 2. The results are reported for the joint test LM J in (8), and the individual tests in (9) corresponding to the first four moments given by LM 1 , LM 2 , LM 3 and LM 4 , where the block size is set at Simulated incomes are obtained by sampling from a bivariate log-normal distribution, which is achieved by drawing random numbers from a bivariate normal distribution and taking the exponential of the simulated normal random variables.For the size experiments, the means and variances of the two distributions are equal, whereas for the power experiments, these parameters in general differ across the experiments to capture changes in the moments of the income distribution.Sample dependence is controlled by the correlation parameter with ρ = 0.0 (independence), ρ = 0.5 (positive dependence) and ρ = −0.5 (negative independence).The number of replications is set at 10,000 for all simulation experiments.

Size properties
Table 2 reports the size properties of the LM statistics with subsampling where the two samples are either independent or dependent.The population parameters of the two distributions are set at μ = 10.5 and σ = 0.6, which are based on the parameter estimates obtained from the income distribution data discussed in Section 2 assuming a lognormal distribution.The log-normal distribution is commonly used in empirical research (Wierenga 1978;Kmietowicz 1984;Park & Marron 1990;Sarabia et al. 2007).Other parametric distributions also used are gamma (Salem & Mount 1974;McDonald, Sorensen & Turley 2013), Weibull (Singh & Maddala 1976), Pareto (Champernowne 1953;Singh & Maddala 1976), mixed exponential (Creedy & Tan 2007) and the bivariate Singh-Maddala distribution (Singh & Maddala 1976).Some of these distributions are used in the power experiments conducted next.For comparison, results are also reported for the LM test without subsampling by using an asymptotic χ 2 distribution to compute p-values.The results in Table 2 are presented for a nominal size of 5% with the results for 1% and 10% nominal levels given in Appendix B.
The empirical sizes in Table 2 of the joint and individual LM statistics based on subsampling are all close to the nominal size of 5%, with a tendency to be marginally undersized.This result occurs for all sample sizes ranging from 500 to 8000, and regardless of whether the samples are independent or dependent.In comparison, Table 2 shows that by performing inferences without subsampling can lead to size distortions when the two samples are dependent, but not when they are independent.The LM J and LM 1 statistics without subsampling are undersized (over-sized) when the two samples are positively (negatively) correlated, with LM 1 exhibiting the greater size distortions.In the case of positively correlated samples, the empirical sizes are less than 1% compared to the nominal 5% level.For negatively correlated samples, LM 1 exhibits size distortions approximately twice as large as the nominal size of 5%.In contrast, LM 2 and LM 4 tend to have empirical sizes slightly below the 5% nominal level no matter whether the samples are positively or negatively correlated, whereas LM 3 tends to be undersized (over-sized) for positively (negatively) correlated samples.The same phenomenon is also observed for nominal sizes of 1% and 10% in Tables B1 and B2, respectively, given in Appendix B. Appendix C provides additional simulation results which extend the Monte Carlo results reported in Bera, Ghosh & Xiao (2013), by highlighting the impact of dependent samples on the size properties of the LM statistics without subsampling when differences in the size of the two samples do not satisfy the asymptotic condition based on the rule M < N 1/2 (Bera, Ghosh & Xiao, 2013, p. 424).

Power properties
The power properties of the joint and individual LM statistics based on subsampling are presented in Table 3. Experiments are presented which allow for changes in the first four moments of the G(y) distribution function, both individually and jointly.As with the size experiments, results based on independent samples are presented, as well as samples that are either positively or negatively correlated.

Change in mean
The first power experiment allows for an increase in the mean of G(y) while holding the variance between F (x ) and G(y) the same.The F (x ) distribution is log-normal with parameters μ = 10.5 and σ = 0.6 as for the size experiments, whereas the mean of the G(y) distribution increases by 10% without a change in the variance.To hold the variance constant, the log-normal parameters of G(y) are chosen as μ = 10.62 and σ = 0.55 (to 2 decimal places).The results of this experiment are given in the top left block of Table 3 under the header 'Change in mean' for a nominal size of 5%, with the results for 1% and 10% given in Tables B3 and B4, respectively, of Appendix B.
The joint and all four individual LM statistics exhibit monotonic power functions providing evidence these are consistent tests.The LM J statistic in the case of independent samples has power of 67% for N = M = 500, increasing to 95% for N = M = 1000 and further increasing to 100% for N = M = 4000.The LM 1 statistic overall has marginally higher power than LM J , which is a reflection that the change in G(y) is the result of a change in the mean of this distribution.
Even though the change in the distribution arises from a shift in the mean while holding the variance constant, nonetheless LM 2 still exhibits increasing power in this direction, although not surprisingly exhibiting lower power than LM 1 .The LM 3 and LM 4 statistics exhibit relatively low power in detecting the change in the mean, even though there is a change in the higher-order moments with skewness decreasing from 2.26 for the F (x ) distribution to 2.01 for the G(y) distribution, while kurtosis decreases from 10.27 for F (x ) to 7.95 for G(y).
When there is positive dependence between the two samples, LM J and LM 1 exhibit even higher power than in the case of independent samples.The opposite occurs for the case of negatively dependent samples, with LM J and LM 1 exhibiting relatively lower power than achieved for independent samples.No such changes are exhibited by LM 3 and LM 4 , which tend to display similar powers regardless of whether the two samples are dependent or not.

Change in variance
In the second power experiment, the F (x ) distribution is the same as it is in the mean power experiment, while the variance of G(y) increases by 10% by choosing the parameters μ = 10.49and σ = 0.62 (to 2 decimal places) to ensure there is no change in the mean of the G(y) distribution.The results of this experiment are given in the top right block of Table 3 under the header 'Change in variance' for a nominal size of 5%.
All LM statistics have monotonic power functions with LM 2 exhibiting the highest power, which is a reflection that the change in the distribution is the result of a change in the variance.This result also holds for both positively and negatively dependent samples.The joint test statistic LM J also exhibits similar, but slightly smaller power properties than LM 2 .The LM 1 and LM 4 statistics display much lower powers than LM 2 and LM J , while LM 3 has no power in detecting a change in the distribution arising from an increase in the variance.As requested by a referee, we also conduct simulation experiments to investigate the power of the joint and individual LM statistics when the mean and variance of the distribution are both allowed to change.The simulation results find that LM J and LM 1 have high power in detecting such changes, whereas the power of LM 2 is weakened when compared with the power results presented in Table 3 for the separate mean and variance experiments.These results are presented in Table B5 of Appendix B.

Change in skewness
The third experiment examines a change in skewness by generating data from a gamma distribution for G(y).The gamma distribution has parameters α = 2.31 and β = 18,840 such that G(y) has the same mean and variance as F (x ).The experiment represents a reduction in skewness, as the skewness of the F (x ) distribution is exp(σ 2 ) + 2 exp(σ 2 ) − 1 = 2.26, which is higher than that of G(y), 2/ √ α = 1.32.The results of the skewness experiment are tabulated in the bottom left block of Table 3.The joint LM J statistic is able to detect a change in the income distribution even with relatively small sample sizes of N = M = 1000.The LM 2 and LM 3 statistics, and to a lesser extent LM 4 , exhibit strong power when there is a change in skewness.This is not the case for LM 1 , which exhibits no power in detecting a change in skewness with powers close to the nominal size of the test.These results hold for whether the two samples are independent or dependent.

Change in kurtosis
The last power experiment allows for a change in kurtosis by defining G(y) as the log-Student t distribution function with ν = 5 degrees of freedom, while keeping the remaining parameters fixed at μ = 10.5, σ 2 = 0.6 2 .The results of the kurtosis experiment are given in the bottom right block of Table 3.
Inspection of Table 3 shows that LM 4 detects a change in the distribution arising from a change in kurtosis, with the strength of the power increasing monotonically as the sample size increases.The joint statistic LM J has stronger power than the individual tests.The LM 2 statistic also exhibits power to a change in kurtosis albeit with lower powers than the LM 4 statistic.The LM 1 and LM 3 statistics display very little power, which is consistent with the experiment representing a change in an even-order moment.A comparison of the power functions of the LM statistics for alternative values of the correlation parameter suggests that ρ has little impact on the power properties of these statistics for this experiment, with the power functions of LM J , LM 2 and LM 4 tending to be marginally higher for positively correlated samples than they are for negatively correlated samples.

Application to the Australian distribution of income
The testing methodology presented in Section 3 is now applied to identifying changes in the Australian income distribution using panel data on household incomes from 2001-02 to 2015-16, as discussed in Section 2. Special attention is given to identifying changes in the distribution of income over the business cycle by testing for changes during upswings from troughs to peaks, downswings from peaks to troughs, as well as changes across business cycles by measuring differences in income distributions between troughs of neighbouring cycles and between peaks of neighbouring cycle.
The phases of the Australian business cycle are based on the Melbourne Institute: Applied Economic & Social Research who publish the peaks and troughs of the Australian economy (https://melbourneinstitute.unimelb.edu.au/publications/macroeconomic-reports/phases-of-business-cycles-in-australia).The business cycle turning points are determined using the BBQ methodology of Harding & Pagan (2002), which is a quarterly version of the Bry & Boschan (1971) algorithm corresponding very closely to what is used by the National Bureau of Economic Research (NBER) in the United States to date the US business cycles.This algorithm identifies a peak (trough) at time t, based on the condition that {y t > (<)y t±k , k = 1, 2, . . ., K }, where y t is the log of real GDP and K = 5 is usually used for monthly data.

Income distributional variations over the business cycle
The results of the LM tests of changes in the distribution of income across different phases of the business cycle are summarised in Table 4, which gives the p-values of the joint test LM J that combines the first four moments, as well as the individual tests for each of the four moments.The subsampling block size is set at B = [5N 1/2 ], where N is the sample size as given in Table 1.

Phase: Trough to peak
The LM tests of changes in the income distribution for the upwsing phases from troughs to peaks of the business cycle are presented in the first panel of Table 4.For the 2012-13 to 2013-14 phase of the cycle, no significant variations are detected.This is not the case for the 2002-03 to 2006-07 phase where the joint test reveals significant variations in income distributions from the trough to the peak phase of the cycle, with a p-value of 0.0066.The individual tests provide evidence of significant changes in the mean and skewness of the distribution of income.There is also, albeit marginally weaker evidence of a change in the variance of the income distribution, with LM 2 having a p-value of 0.0571.In contrast, there is no significant evidence of changes in the distribution's kurtosis where the p-value of LM 4 is 0.1615.The phase of the business cycle from 2008-09 to 2010-11 corresponds to the recovery period from the GFC.The LM 3 and LM 4 statistics provide significant evidence of changes in the higher-order moments of skewness and kurtosis respectively, but not in the lowerorder moments as given by the mean and the variance.The p-value of the joint test is 0.0591, which is just marginally in excess of the 5% level.This outcome is a reflection of a loss of power from using a joint test as a result of the statistic being conducted over all four moments.

Phase: Peak to trough
The second panel of Table 4 presents the p-values of the LM tests of income distributional changes in the peak to trough phase of the business cycle.The 2006-07 to 2008-09 represents the downturn in the business cycle arising from the GFC.The joint test reveals a significant change in the income distribution over this phase, which is caused by changes in the odd moments given by the mean (p-value of 0.0097) and skewness (p-value of 0.0000).Similar results are obtained by the joint test for the downturn phase of the business cycle from 2010-11 to 2012-13, with the difference being that it is now changes in the evenmoments corresponding to the variance (p-value of 0.0032) and kurtosis (p-value of 0.0140) that characterise changes in the distribution of income.For the 2013-14 to 2015-16 phase of the business cycle, there is much weaker evidence of variations in the distribution of income with the only evidence of any change coming from the variance (p-value of 0.0641).

Phase: Trough to trough
The third panel of Table 4 gives the results from testing for variations in the distribution of income across neighbouring troughs of the business cycle.There is no significant evidence of a change in the distribution of income at the start of the sample period between the 2001-02 and 2002-03 troughs.This is not the case when comparing income distributions across the 2002-03 and 2008-09 troughs, where there is strong evidence of a significant change in all moments with the exception of skewness.Similar results are obtained by comparing the 2008-09 to 2012-13 business cycle toughs where the main changes arise from the higher-order moments of skewness (p-value of 0.0000) and kurtosis (p-value of 0.0395).The main significant variations of the income distribution in the 2012-13 and 2015-16 troughs arise from the variance (p-value of 0.0474).

Phase: Peak to peak
The last panel of Table 4 gives the results from testing for changes in income distribution across business cycles from peak to peak.Income distributional changes between the business cycle peaks of 2006-07 and 2010-11 primarily arise from a change in skewness, although the p-value of the test is marginally above 5% (p-value of 0.0719).Comparing income distributions across the business cycle peaks in 2010-11 and 2013-14, the results provide even stronger evidence of changes in the distribution of income with the main variations stemming from the variance (p-value of 0.0070) and skewness (p-value of 0.0233).

Income distributional variations around the GFC
The empirical results presented above suggest the business cycle associated with the GFC resulted in a change in the moments characterising income distribution movements over time, with higher-order moments tending to become the dominant feature of these changes.To investigate further the changes in the Australian income distribution during this downswing period of the business cycle, the LM tests are applied to neighbouring years during the period of the GFC.
The results in Table 5 further highlight the importance of higher-order moments in characterising income distributional changes.For the phase prior to the trough in 2008-09, namely 2007-08 to 2008-09, the main effect of the GFC is to change the skewness of income distribution, with no evidence of any significant effects on the mean and variance, or even kurtosis.For the following year in 2008-09 to 2009-10, both skewness and kurtosis are now jointly important without any significant changes detected in the mean or variance of the Australian income distribution.This change in the characteristics of the income distribution is also reflected by the change in the significance of the LM J statistic, which becomes statistically significant at the 5% level with a p-value of 0.0255.For the year 2009-10 to 2010-11, immediately following the business cycle trough, all LM statistics reported in Table 5 are statistically insignificant suggesting that the effects of the GFC on the distribution of income in Australia had eventually stabilised.

Conclusions and suggested extensions
This paper uses an adjusted two-sample version of the Neyman smooth statistic to identify changes in the distribution of income in Australia over the business cycle from 2001-02 to 2015-16, by matching households from panel data across different years.An important feature of the approach is the use of subsampling methods to correct for potential size distortions arising from household incomes being dependent over time.The approach is able to not only detect changes in income distributions over alternative phases of the business cycle but also identify the underlying changes in the moments of the distribution, such as the mean and the variance, as well as in the higher-order moments including skewness and kurtosis.Simulation experiments show that the proposed testing framework corrects for potential size distortions arising dependent samples, as it yields empirical sizes comparable to the nominal size as well as yielding monotonic power functions.
The empirical results highlight the importance of higher-order moments in characterising movements in income distributions over time.As the relative strength of the moments determining income distributional movements vary across cycles, this provides evidence that Australian business cycles are not all alike.Income distribution movements over business cycles in early 2000 are the result of changes in the mean, variance and skewness.For the GFC downturn in 2008-09 and the upswing that immediately follows, it is the changes in skewness and kurtosis that are the dominant factors in time-variations in the Australian income distribution.These results highlight the need for specifying flexible parametric models that capture variations in income distributions.This is critical when modelling time variations in income distributions over business cycles given that business cycles are not found to be alike as the relative strength of moments varies for different phases of the cycle as well as for different cycles over time.
The theoretical and empirical analyses in this paper can be extended in a number of ways.First, the construction of the two-sample version of the LM smooth test is based on comparing non-parametric income distributions over time using the probability integral transform.An alternative approach is to apply the probability integral transform to a flexible class of parametric income distributions (see, e.g., McDonald, Sorensen & Turley 2013).As the parameters of parametric distributions typically need to be estimated, testing based on asymptotic distribution theory produces additional biases as a result of super-uniformity (Thomas & Pierce 1979).To determine the ability of the proposed subsampling design to correct for size distortions arising from super-uniformity, the Monte Carlo simulations presented in this paper can be extended by replacing the non-parametric distribution by a parametric distribution function where the unknown parameters need to be estimated.Second, while the analysis is based on comparing unconditional income distributions at different points in time, an interesting extension would be to allow for control variables to capture household characteristics and other factors such as regional differences in incomes (see, e.g., Johnson & Wilkins 2004).This can be implemented by performing the LM tests on household incomes decomposed in terms of particular characteristics.One strategy would be to decompose household incomes by states to allow for regional differences in incomes over time to be identified.This strategy would be especially important for Australia where a two-speed economy was found to be operating during the minerals boom in the early 2000s.An alternative approach would be to specify income distribution models by conditioning on household characteristics, and apply the testing framework to the residuals of these estimated models following Linton, Maasoumi & Whang (2005).
Third, the two-sample smooth tests based on subsampling methods presented in this paper focus on testing for changes in income distributions at two different points in time.Extensions to allow for multiple comparisons at several points in time could be implemented using the testing methodology of Zhang, Guo & Zhou (2022), for example.Given that they approximate the asymptotic null distribution using the cumulants of the chi-square distribution, it would be of interest to compare the sampling performance of this approach with one based on the subsampling methods proposed here by extending their simulation results.

Appendix A Appendix: Theoretical derivations
This appendix reviews the derivation of the Lagrange multiplier version of the smooth test given in Bera, Ghosh & Xiao (2013).To construct the LM statistic based on the null hypothesis in (4), the log-likelihood function for a sample of size M for the generalised uniform distribution in equation ( 2 where the Z i are defined in (7) by using the empirical distribution function.
The j th element of the gradient vector G(θ ) of the log-likelihood in (A1) is Evaluating this expression under the null θ = θ 0 gives by using the result ∂ log c(θ )/∂θ j θ =θ 0 = 0. To show this result, from the definition of the normalising constant take natural logs and differentiate the expression to give Evaluating this expression under the null θ = θ 0 yields the result where the last step uses the properties of the standardised Legendre polynomial.
The Hessian of the log-likelihood in (A1) has typical element where the second step is based on differentiating (A2) by θ l .Evaluating this expression under the null θ = θ 0 , and changing the sign gives the corresponding element of the information matrix as ∂ 2 log c(θ )/∂θ j ∂θ l θ =θ 0 = −1 for j = l , and 0 otherwise.To show this last result, differentiate (A4) with respect to θ l and take unconditional expectations Evaluating this expression under the null θ = θ 0 gives the result which immediately follows from the orthonormality property of the standardised Legendre polynomials.
Substituting equations (A3) and (A5) into (5) gives the joint LM statistic where represents the sample mean of the j th Legendre polynomial evaluated at Z i .

Appendix B Appendix: Additional simulation results on the test size and power with equal sample sizes
This Appendix provides additional simulation evidence on the properties of the two-sample version of the LM statistics, with and without subsampling, where the latter is based on using the χ 2 distribution to construct p-values.Sample sizes are chosen to be the same, N = M , to reflect that household incomes from panel data are matched.The design of the simulation experiments is the same as discussed in Section 4.

Size properties
The simulation size results are reported in Table B1 (nominal size of 1%) and Table B2 (nominal size of 10%).These results are similar to the size results reported in Table 2 based on a nominal significance level of 5%.The empirical size of the LM statistics based on subsampling are close to the nominal sizes, although there is a tendency to be slightly over-sized at the 1% level and undersized at the 10% level.This result occurs regardless of whether the samples are independent or dependent.The LM version of the tests without subsampling yield sizes comparable to the nominal sizes in the case where the samples are independent.For positively correlated samples these statistics become undersized, with the size distortions tending to be greatest for LM 1 .In the case of negatively correlated samples, the extent of the size distortions is a function of the moments being tested with LM 1 and LM 3 being over-sized, and LM 2 and LM 4 being undersized.The net effect on LM J is that it exhibits similar properties to the odd-moment statistics.The size distortions of LM 1 can be quite severe with empirical sizes of around 18% for example, compared to a nominal size of 10%.

Power properties
Simulation power results are presented for the LM statistics based on subsampling in Tables B3 (nominal size of 1%) and B4 (nominal size of 10%).The power results are reported for four cases, corresponding to a change in mean (top-left), a change in variance (top-right), a change in skewness (bottom-left), and a change in kurtosis (bottom-right).
The results presented in Tables B3 and B4 are qualitatively similar to the power results reported in Table 3 based on a nominal significance of 5%.The simulations show that LM J is consistent with the power increasing monotonically as the sample sizes N and M increase subject to the condition N = M .This result holds regardless of whether the samples are correlated or independent.The LM 1 statistic exhibits the greatest power for the mean experiment, while LM 2 statistic exhibits the greatest power for the variance experiment.
The two bottom panels give simulation results from a change in the higher-order moments of skewness or kurtosis.For the skewness experiment LM J exhibits the greatest power, closely followed by LM 2 and LM 3 and to a lesser extent LM 4 .The first moment test statistic LM 1 has no power in the direction of skewness with powers close to the 1% and 10% nominal sizes.For the kurtosis experiment LM 4 displays the greatest power, followed by LM J and LM 2 .The odd-order moment test statistics LM 1 and LM 3 exhibit the lowest powers.As requested by a referee, we also conduct power simulation experiments to examine the power of the LM tests with subsampling when there are simultaneous changes to the mean and variance of the distribution.The experiments consist of simulating data of both samples from log-normal distributions, where the parameters of the F (y) distribution are the same as before, whereas the mean and variance of the G(y) increase by 10%.The results are presented in Table B5 and show that LM 1 exhibits the greatest power, closely followed by LM J .Of the remaining statistics, LM 2 displays the next highest powers followed by LM 3 and LM 4 .

Appendix C
Appendix: Size properties of the LM test without subsampling with different sized samples.
The simulation results presented in this appendix extend the Monte Carlo results of Bera, Ghosh & Xiao (2013) by highlighting the impact of dependent samples on the size properties of the LM test without subsampling when differences in the size of the two samples do not satisfy the asymptotic condition based on the rule M < N 1/2 (Bera, Ghosh & Xiao (2013), p. 424).The results are reported for the joint test LM J in (8), and the individual tests in (9).The sample size of the F (x ) distribution is fixed at N = 8000, while the sample sizes of the G(y) distribution are M = {50, 500, 1000, 4000}.The choice of a maximum sample size of N = 8000 is based on the annual sample sizes of household incomes used in the empirical application.
The results reported in Table C1 show that the empirical size of the LM statistics without subsampling are very close to the 1%, 5% and 10% nominal sizes for M = 50 and N = 8000.This result applies to independent samples as well as for dependent samples regardless of whether the correlation is either positive or negative.Increasing M to 500 causes the LM tests to become slightly oversized, but nonetheless still supporting the theoretical condition the LM tests are asymptotically distributed as χ with K = 4 degrees of freedom in the case of the joint test LM J , and K=1 degrees of freedom in the case of the individual tests LM j .Upon further increasing M with N = 8000, the size distortions become progressively more severe.For samples of M = 4000 with a 5% nominal significance level the empirical size of the joint test is just under 20% for the case of independent samples,  .66 11.32 11.51 11.59 10.56 10.01 10.41 10.57 10.66 12.46 12.43 10.48 11.41 11.18 1000 13.78 11.42 12.06 11.86 12.42 12.33 10.20 10.88 11.59 11.49 14.50 13.95 11.10 12.09 11.42 4000 26.66 17.83 18.56 18.26 17.76 19.04 10.01 14.43 16.68 16.84 29.37 24.48 14.90 19.59 16.70 Notes: The data generating process consists of two log-normal distributions with the same mean and variance.The number of replication is 10,000.
which is marginally higher for negatively correlated sample but less inflated for positively correlated samples.Individual test statistics exhibit relatively smaller size distortions than the joint test in general.For M = 4000 with 5% nominal significance level, the empirical sizes for the individual tests are approximately double the nominal size compared to the joint test where there is more than a threefold increase compared to the nominal size.The results also show that positive dependence between the two samples helps to reduce the size of the bias, whereas negative dependence tends to amplify the bias.Moreover, in the case of positively dependent samples, the empirical size of the individual tests become progressively more inflated the higher is the moment being tested.
An interesting outcome of the simulation results is that provided the LM statistics are correctly sized by choosing the two samples to satisfy the asymptotic condition M = O(N 1/2 ), the presence of dependent samples has little or no effect on the size properties of the tests.It is only when the tests experience size distortions does the presence of dependent samples have any impact, with positively dependent samples tending to soften the size distortion, while negatively dependent samples tending to amplify the distortion.

Figure 2 .
Figure 2. The distribution of income in Australia around the time of the GFC from 20007-08 to 2010-11, based on real equalised disposable income ($'000, base 1989-90).

©
2024 The Authors.Australian & New Zealand Journal of Statistics published by John Wiley & Sons Australia, Ltd on behalf of Statistical Society of Australia.
The Authors.Australian & New Zealand Journal of Statistics published by John Wiley & Sons Australia, Ltd on behalf of Statistical Society of Australia.

Table 2 .
Size of the LM tests with and without subsampling, and N = M .
Notes: Based on a 5% nominal level of significance with block size B = [5N 1/2 ].The two distributions have the same mean and variance.The number of replications is 10,000.

Table 3 .
Power of the LM tests with subsampling, for N = M .Based on a 5% nominal level of significance with block size B = [5N 1/2 ].The number of replications is 10,000. Notes:

Table 4 .
LM test p-values of the Australian income distribution for alternative business cycle phases from 2001-02 to 2015-16.
Notes: Subsampling is based on a block size of B = [5N 1/2 ].

Table 5 .
LM test p-values of the Australian income distribution around the GFC from 2007-08 to 2010-11.

Table B1 .
Size of the LM tests with and without subsampling, and N = M .Based on a 1% nominal level of significance with block size B = [5N 1/2 ].The two distributions have the same mean and variance.The number of replications is 10,000.
Notes: © 2024 The Authors.Australian & New Zealand Journal of Statistics published by John Wiley & Sons Australia, Ltd on behalf of Statistical Society of Australia.

Table B2 .
Size of the LM tests with and without subsampling, and N = M .Based on a 10% nominal level of significance with block size B = [5N 1/2 ].The two distributions have the same mean and variance.The number of replications is 10,000. Notes:

Table B3 .
Power of the LM tests with subsampling for N = M .

Table B4 .
Power of the LM tests with subsampling for N = M .

Table C1 .
Size properties of the LM tests without subsampling, for M < N .