4 Changes in the Distributions
In this section we concentrate on describing more fully the 1998 and 2004 distributions of working-age individuals’ earnings and total income, and their equivalised household incomes, and the changes in these distributions. We do this by constructing kernel density estimates of the distributions in order to provide a visual appreciation of the changes across the full range of earnings and incomes.[18]
For the analysis presented here we excluded zero earning and incomes, but included negative self-employment earnings and incomes so, in contrast to the descriptions provided thus far, this analysis is correctly viewed as conditional (on non-zero incomes).[19] This has little impact on individuals’ equivalised household incomes since, as seen in Table 2, 98 percent of working-age individuals live in households with non-zero income in each year. However, only 84 percent of individuals have non-zero total income in either year; furthermore, only 68 percent and 73 percent of individuals have non-zero earnings in 1998 and 2004 respectively. Thus, any observed changes in the earnings distribution, in particular, between 1998 and 2004 may be due either to the increased employment in 2004 being non-random across the earnings distribution, or to earnings distributional changes conditional on employment. Also, each distribution has been estimated on a logarithmic scale, and we have left- and right-censored the income variable so as to restrict the income range displayed while still representing accurately the degree of (non-zero) mass in each tail.[20]
We begin by describing the distributions of individuals’ conditional earnings in 1998 and 2004, and the changes between these years. Figure 3a shows the kernel density estimates of the distribution of earnings in 1998 (dotted line) and in 2004 (solid line), and the change between 1998 and 2004 (dashed line), calculated simply as the vertical distance between the 1998 and 2004 lines. From this figure it seems there were only comparatively small changes in the overall earnings distribution. There was a predominant, though modest, increase in earnings reflected by the rightwards shift in the distribution between 1998 and 2004, and perhaps more clearly in the tendency for negative changes in the distribution at low earnings levels and positive changes at higher earnings. For example, mean conditional earnings increased by about 7.5 percent, and median earnings increased by about 4 percent over this period. (Note that these increases are lower than the unconditional increases described in Figure 1a, because of the increase in employment over this period.)
Given the similarity of the earnings distributions in 1998 and 2004, a simple way to compare them formally is using a Kolmogorov-Smirnoff (KS) test of the equality of two distributions.[21] Perhaps unsurprisingly, given that the mean and median increased over the period, the null hypothesis that the two distributions are equal is easily rejected here.[22] A more useful statistical comparison may be obtained by first adjusting one distribution so that the two distributions have the same mean (or median). For this purpose we have adjusted the 1998 earnings data, by the difference in means and standard deviations between the two years, so that the distribution of the adjusted data has the same mean and standard deviation of the 2004 distribution.[23] The KS test still easily rejects the hypothesis of equal distributions (p-value=0.000) and, although the maximum difference between the cumulative distributions is 3.4 percent, this adjustment doesn’t noticeably improve the match between the 1998 and 2004 distributions.
Figure 3b present analogous kernel density estimates for the distributions of individuals’ total income in 1998 and 2004. There is a more noticeable shift in the income distribution than in the conditional earnings distribution, reflected by the 9-9.5 percent increases in the mean and median. Again, there is a (visually) modest drop in density in the lower income range, and an increase in upper income range. A large fraction of the drop in mass around the $10,000-13,000 range can be attributed to the effect of the increasing age of eligibility for New Zealand Superannuation (NZS) over the period: 63 and 64 year-olds were eligible for NZS in 1998, but were ineligible by 2004. We will see in the next section that the drop in density in this region translates into a strong impact on the growth of incomes between the 20th and 30th percentiles of the income distribution. The KS-tests again reject the equality of the 1998 and 2004, and the 1998-adjusted and 2004, distributions (p-values=0.000); however, the adjustment again does lower the maximum difference in the cumulative distributions from 6.2 percent to 2.6 percent.
Figure 3c present the estimated distributions of working-age individuals’ equivalised household total income in 1998 and 2004. The changes in this distribution over time appear much clearer than those for individual earnings and income. There was a steady rightward shift in the distribution over most of the income range: the density fell over the equivalised income range $10,000-$30,000, and increased over the $30,000+ range. The KS-test for the equality of the 1998-adjusted and 2004-actual distributions again rejects this hypothesis (p-value=0.000), but the adjustment lowers the maximum difference between the cumulative distributions from 7.6 percent to 2.6 percent.
Our summary of the changes in the income distributions is that there have been comparatively steady increases in earnings and incomes for both individuals and households, and little evidence of any dramatic localised changes in the distributions. This conclusion is largely in line with the summary statistics presented in Tables 1 and 2 and the trend figures discussed in the previous section, and in sharp contrast to the dramatic distributional changes observed during the 1980s and 1990s in New Zealand (e.g. see Hyslop and Maré, 2001, 2005). This suggests focusing on changes in summary statistics of the distribution may provide an adequate account of the changes. This view is also supported by analysis of gross unequivalised household income not presented here.[24]
Notes
- [18]See Silverman (1986) for a detailed account of kernel density estimation, and Hyslop and Maré (2001, 2005), and Dixon and Maré (2004) for recent applications in the context of income distribution changes in New Zealand. We adopted a constant bandwidth of 0.05 across all the density estimates presented, which is somewhat lower than the so-called “optimal” bandwidth which varies between 0.07 and 0.09: the narrower bandwidth allows for more localised variations in the distribution to be identified without trading off too much smoothness in the estimates.
- [19]The effect of including zero incomes would be relatively transparent: first, there would be a “spike” in the income distribution at zero, corresponding to the fraction (p0, say) of individuals with zero income; second, the remaining non-zero distribution would be scaled down by a factor of (1-p0). In our view, the main consideration associated with the decision to in/exclude zeros relates to the ability to compare two distributions over time when the fraction of zeros changes: in our context this seems to be an issue only for earnings.
- [20]Earnings and incomes were left and right censored at log(income)=7 (approximately $1,100) and log(income)=12 ($163,000) respectively. The effect of censoring shows up on the figures as localised mass points at each end of the income ranges. In a logarithmic context, censoring also provides a convenient way to handle negative incomes.
- [21]For example, see Conover (1999). The KS test is quite powerful for detecting differences around the middle of the distribution and due to clustering in the data, but not very powerful against differences in the tails of the distribution.
- [22]The p-value=0.000 for this test. Perhaps more meaningful is that the maximum difference between the cumulative earnings distributions in 1998 and 2004 is 4.4 percent.
- [23]Jenkins and Van Kerm (2004) present a formal discussion of this decomposition method. Specifically, we have constructed , where and are the mean and standard deviation of earnings in year-t, and yi98 is individual-i's earnings in 1998. Similar results are obtained based on analogous adjustments using just the mean, and just the median.
- [24]In fact, KS tests do not reject the hypothesis that the mean and standard deviation adjusted household income distributions in 1998 and 2004 are equal.
