Formats and related files
Authors: Christopher Ball and Michael Ryan
Abstract#
This paper seeks to quantify how the welfare of different types of household changed between 2006/07 and 2009/10; a period which included the 2008/09 recession. We use three measures of household welfare: income, expenditure and the equivalent variation metric. The equivalent variation is a measure of the welfare lost owing to price changes. Using household level data from the Household Economic Survey (HES), we allocate households into "types" on one dimension (for example age group) as is traditional in the literature but also cluster the data into 12 different representative households based on 9 demographic and economic dimensions. Households in low income groups, with children and/or who rent were particularly impacted by the recession in terms of welfare losses owing to price changes. However we find that those in low income groups had strong increases in expenditure; furthermore the welfare gains from this increased expenditure more than offset the welfare losses from the price changes.
Acknowledgements#
Special thanks to John Creedy for helpful assistance throughout both the research and editorial process. Thanks to Hemant Passi, Nicolas Herault, Simon McLoughlin, Peter Mawson, Bryan Perry and participants at an internal Treasury seminar for helpful comments and to Emily Chai for helping with initial setup work on this project.
Disclaimer#
Access to data used in this study was provided by Statistics New Zealand under conditions designed to give effect to the security and confidentially provisions of the Statistics Act 1975. Results presented in this study are work of the New Zealand Treasury and not Statistics New Zealand.
The views, opinions, findings, and conclusions or recommendations expressed in this Working Paper are strictly those of the author(s). They do not necessarily reflect the views of the New Zealand Treasury or the New Zealand Government. The New Zealand Treasury and the New Zealand Government take no responsibility for any errors or omissions in, or for the correctness of, the information contained in these working papers. The paper is presented not as policy, but with a view to inform and stimulate wider debate.
Executive Summary#
This paper provides estimates for different New Zealand household types of their welfare changes between 2006/07 and 2009/10. This is an interesting period to study as it includes the 2008/09 recession. The metrics used to measure welfare are income, expenditure and equivalent variation. The equivalent variation measures the welfare change owing to price changes.
Households are divided into 12 types (or 'clusters') based on applying the K-harmonic means clustering technique to the 2006/07 Household Economic Survey (HES). Our clustering aims to group households in the same cluster that are more similar to each other than they are to households in different clusters on a number of dimensions. We use 9 different demographic and economic dimensions. Owing to the absence of longitudinal data, 'similar' households were then found in the 2009/10 HES dataset by clustering the 2009/10 dataset using the 2006/07 cluster centres.
Our key findings are as follows. First, the estimated welfare lost owing to price changes is substantially higher for households in the low income groups, households with children and households who rent. Second, those in the low income groups had strong increases in their expenditure; further these expenditure welfare gains were larger than the welfare lost owing to price changes.
Other interesting results are there was a shift towards renting from home ownership in younger age groups. This could reflect a reluctance to take on debt in the face of increased uncertainty by younger households, or alternatively, the tightening of lending standards by banks. Second, highly geared households and households with large mortgage payments relative to income have reduced their durable expenditure. This may indicate a desire to reduce debt given falling house prices and less certain labour market prospects. Consistent with the Household Labour Force Survey we find older households increased their participation in the workforce. Further our analysis found that working or working part-time older households had much stronger disposable income growth than those that have fully retired.
1 Introduction#
New Zealand went into recession[1] in the first quarter of 2008 and did not grow in the 6 subsequent quarters. As a result, real GDP was 3.3% lower in the June quarter 2009 than it was in the December 2007 quarter.[2] The recovery has been slow. At the December 2011 quarter, real GDP had only just regained its December 2007 level.
Recessions can affect households in many ways; these include falling asset values, rising unemployment, and increased uncertainty. Additionally these phenomena affect different types of household with varying levels of severity. Generally, for example, older households have much of their wealth in assets (namely housing) and some, through downsizing to a smaller home, use this wealth to fund (the majority of) their retirement (Smith, 2007). This means that some older households are disproportionately affected by falls in asset prices relative to, say, young households, particularly renters. Conversely, increased unemployment in a recession may disproportionately affect younger cohorts. First, recessions make it harder to find a job upon initially entering the working age population and, second, recessions may make it harder to find a job that utilises one's skill set. For young people this means relevant skills become harder to acquire and/or skills gained elsewhere (for example in formal qualifications) depreciate, affecting their future labour market prospects. Finally, the increased uncertainty associated with a recession affects the behaviour of those with less of a buffer to absorb shocks, perhaps those highly in debt and those with large fixed outflows (relative to income).
In addition to the phenomena discussed above, the 2008/09 recession in New Zealand was also coincident with some large relative price movements in goods and services. [3] The varying weights of goods that rose and fell in price in different households' expenditure bundles mean there are likely to be heterogeneous impacts on different households' welfare owing to these price changes. One example is rents, which rose over the recession, whereas mortgage rates fell.
Using aggregate level data, such as private consumption and disposable income from the National Accounts or the Consumers Price Index, to draw conclusions on the impact of the recession on welfare is not ideal. This is because movements in these aggregates represent "the average" impact, and possibly mask the large differences that could have occurred across household types. This paper recognises the possibility there have been varying welfare changes for different household types over the recession and seeks to measure these changes using microeconomic data from the Household Economic Survey,[4] hereafter HES. This raises two issues: first, defining household "type"; and second, measuring the changes in welfare.
Section 2 discusses issues defining household type and measuring welfare. In Section 3, we look at the household types created using the traditional method in the literature. Section 4 outlines our alternative approach to creating household types, clustering, and reports the household types we create from applying this technique to the HES data. Section 5 reports how our first two measures of welfare: income and expenditure changed during the recession by household types - both those created on the traditional basis and those created by the clustering process. Section 6 initially looks at how the welfare impacts of the different price changes varied across different household types depending on the composition of their budget. It then discusses what happens when the welfare changes from expenditure and price movements are aggregated to give a sense of the overall effect of the recession on welfare for our different household types. Section 7 concludes.
Notes
- [1] Recession is defined as two consecutive quarters of negative real GDP growth.
- [2] As historical GDP estimates are subject to revision by Statistics New Zealand, these numbers are based on the June 2012 quarter GDP release.
- [3] Some of these price changes are directly attributable to the recession but some were not.
- [4] The Household Economic Survey (HES) collects information on household expenditure and income, as well as a range of demographic information on individuals and households.
2 Measurement issues#
2.1 Defining household types#
In terms of defining household type, we take two approaches. The first is to follow the traditional way in the literature of splitting households into types on one dimension: age groups, income quartiles etc. We term these dimensions "hard dimensions". The term is designed to invoke the notion that the researcher sets an a priori boundary when grouping the data under this approach. For example in splitting the sample by age groups: 26-35, 36-45 etc, the researcher imposes an implicit assumption that there may be a difference between a 35 year old and a 36 year old but none between a 34 year old and a 35 year old.
The different dimensions in the data cannot be thought of as being statistically independent of each other. For example income and age are correlated, people who own their home rather than rent are likely to have higher incomes, and more qualified people generally have higher incomes (see Table 2.1). These relationships make identification of causality difficult. For example, if we find that the lowest income quartile had the lowest income growth - is this because they are more likely to be younger, or less qualified? Or is it not related to either of these? One possible solution is cross-tabulation, by splitting the sample on one dimension (income), then another (age) and then the other (qualification). However this presents a difficulty in small samples to ensure the statistical robustness of the results (and in some instances comply with minimum sample confidentially requirements). Given the 2006/07 HES sample has around 2,500 households, this will be a problem in our case.
Disposable income (June 2007 year, $) |
|
---|---|
Home ownership status | |
Renters | 45,751 |
Mortgage holders | 54,358 |
Other | 75,229 |
Qualification | |
School or none | 45,910 |
Bachelor degree | 70,482 |
Post-graduate | 76,599 |
Our solution to both the aforementioned issues is "clustering". Clustering aims to group observations in the same cluster that are more similar to each other than they are to observations in different clusters on a number of dimensions. Put another way the goal of clustering is to partition observations into homogeneous clusters based on a number of attributes, while observations in different clusters are heterogeneous on those attributes. Thus one advantage of clustering is smaller sample sizes can be split on more dimensions than cross-tabulation to help deal with identification issues, while maintaining confidentiality and statistically significant sample sizes. Second, as opposed to splitting the data on hard dimensions, the data determine the boundaries of a cluster. Using our example from above of splitting the sample by age groups, clustering lets "the data decide" where the boundary lies rather than imposing it between 35 and 36. Finally, clustering captures natural correlations in the data (age and income) which allows us to provide intuitive descriptions of the characteristics of a cluster (eg young high income renters can be termed young professionals).
2.2 Measuring welfare#
The first two indicators of economic welfare we look at are household disposable income and the level and composition of expenditure. Examining how the recession has impacted on household expenditure is important for two reasons. First, slowing expenditure growth to help repair household balance sheets was a feature of the 2008/09 recession; therefore the prospects for a recovery in household expenditure are central to the prospects for a recovery in the wider economy. Secondly, expenditure is connected to the living standards of individuals and households.[5] Therefore changes in household expenditure, as well as household income, are important indicators of the extent to which recessions have a detrimental impact on the living standards of households.
Our third measure of the change in welfare, the equivalent variation, relates to the impact of price changes. The top 10 Consumer Price Index (CPI) categories by price increase between 2006/07 and 2009/10 are shown in Figure 2.1 and similarly by price decrease are in Figure 2.2.[6] [7] The graphs illustrate that some goods and services have experienced large price movements between the period. In particular large expenditures in the household budget (in particular low income earners' budgets) such as petrol, household energy and food, feature in Figure 2.1; additionally rents, although not in Figure 2.1, also increased 6.4%. The increase in rents is particularly important when taken in conjunction with the fact that the effective mortgage rate [8] fell from 7.98% to 6.85% in the period. This implies a relative shift in welfare, in a sense, from renters to mortgage holders.
- Figure 2.1: CPI increases
-
- Figure 2.2: CPI decreases
-
Clearly the exact measurement of how economic welfare for a household changed over the recession is contingent on the time period we measure the changes. Our analysis requires household expenditure broken down by individual expenditures on goods and services. The most recent edition of the Household Expenditure Survey, with expenditure data, ran from July 2009 to June 2010 (hereafter the 2009/10 HES). The previous edition to the 2009/10 HES with expenditure data ran from July 2006 to June 2007 (hereafter the 2006/07 HES).
In HES information on expenditure is collected by a range of methods, including a 12 month recall of large products, information on latest payments for regular payments and a 14 expenditure day diary for adults. The survey also asks for detail on where households got their income for the 12 months previous; for example, wages and salaries, self-employment, investments, or benefits. This means expenditure data in HES 2006/07 and HES 2009/10 mainly covers the periods July 2006 to June 2007 and July 2009 to June 2010 respectively, while the income data covers the respective periods July 2005 to June 2007 and July 2008 to June 2010.[9] In a technical sense the recession ended in Q2 2009.[10] For this reason the expenditure data is our preferred measure of welfare as the June years 2007 and 2010 are as close to shouldering either side of the recession as one can get on a June year basis. Households early in the 2009/10 sample will recall some income made in the second half of 2008, before the recession ended, meaning the income data may understate the recession's impact. More generally this paper takes two snapshots in time as dictated by the data and sees how welfare as changed. The impacts of the recession have lasted longer than the June 2010 year, particularly in the labour market. Therefore this paper does not measure the full impacts of the recession, rather the initial impacts as present in the data to the 2009/10 June year.
Notes
- [5] As it results in the purchase of goods and services from which households derive utility.
- [6]Statistics New Zealand removes any price changes associated with quality improvements from the CPI. Hence electronic type goods that have experienced rapid improvement with new technology show up in the CPI as having large price falls.
- [7]Change in the 2009Q3 to 2010Q2 average of the relevant CPI subcomponent index from the 2006Q3 to 2007Q2 average of the corresponding index.
- [8]The effective mortgage rate is the mortgage rate at each maturity weighted by the proportion of mortgages outstanding at each maturity and can be thought of as the average mortgage rate applying to households; it is available at: http://www.rbnz.govt.nz/statistics/exandint/b2/ [Treasury adjusted URL at 04 Mar 2024: https://www.rbnz.govt.nz/statistics/series/exchange-and-interest-rates/new-residential-mortgage-standard-interest-rates]
- [9]This is because for the 2009/10 HES those who are interviewed in July 2009 will recall income back to June 2008, while those interviewed in June 2010 will recall from July 2009 to June 2010.
- [10]Q3 2009 was the first quarter with positive real GDP growth.
3 Household types split by hard dimension#
3.1 The hard dimensions#
The hard dimensions we use to split the data are age, household ownership status, household structure, highest qualification and income quartile. Age refers to the age of the member of the household who earns the most; household ownership refers to whether the household is a renter, a mortgage holder (or owned outright) or 'other' (typically a trust type arrangement). [11] Qualification is a categorical variable, defined as the whether the highest qualified person in the household has no tertiary qualification, a bachelors degree or a higher post graduate degree. Household structure is defined by looking at the living status of the adults in the household - whether they are a couple, single or other, as well as whether there are children in the house. [12]
Finally the household is also assigned to an income quartile, based on its household equivalised income. The equivalised disposable income is the total income of a household, after tax, subsidies and other deductions, that is available for spending or saving, divided by the number of adult equivalent members. Younger household members are made equivalent to adults by weighting each according to their age. Disposable income is equivalised to allow for the tendency for household expenses to grow with household size but allow for the fact that children need fewer resources than adults, ie the growth is not linear. There are various ways to calculate equivalised disposable income (see Table A.1 in Appendix A), we have chosen to use the Square Root Scale. Table A.1 shows this approach assumes there are more economies of scale in households than other scales, meaning expenses grow by less as household size increases compared to other measures.
Table 3.1 gives the sample counts and population weights of households for the categories within each household type for both the 2006/07 and 2009/10 editions of HES. The use of population weights, according to Statistics N.Z. (2001), takes account of under-coverage in the survey of specified population groups. All our analysis is done by weighting the sample value of a variable by its population weight.
2006/07 | 2009/10 | ||||
---|---|---|---|---|---|
Household Type | Category | weight ('000) |
n | weight ('000) |
n |
Home | |||||
Renting | 477 | 760 | 572 | 1,025 | |
Mortgage holders | 902 | 1,491 | 807 | 1,640 | |
Other | 191 | 299 | 244 | 461 | |
Qualification[13] | |||||
School or none | 1,144 | 1,896 | 1,190 | 2,267 | |
Bachelor degree | 222 | 333 | 237 | 446 | |
Post-graduate | 185 | 291 | 180 | 375 | |
Age | |||||
95 | 144 | 92 | 160 | ||
25-34 | 254 | 406 | 264 | 484 | |
35-44 | 363 | 562 | 336 | 641 | |
45-54 | 316 | 524 | 344 | 631 | |
55-64 | 233 | 348 | 254 | 525 | |
65+ | 308 | 566 | 334 | 685 | |
Income quartile | |||||
1 | 393 | 670 | 406 | 789 | |
2 | 392 | 630 | 406 | 762 | |
3 | 393 | 627 | 406 | 791 | |
4 | 391 | 623 | 405 | 784 | |
Household structure | |||||
Single, no children | 344 | 646 | 355 | 649 | |
Single, children | 134 | 243 | 147 | 301 | |
Couple, no children | 413 | 714 | 422 | 913 | |
Couple, children | 487 | 692 | 491 | 908 | |
Other, no children | 98 | 112 | 107 | 172 | |
Other, with children | 94 | 143 | 101 | 183 |
Notes
- [11] We looked at the demographic characteristics of those in the "other" household ownership status; they were generally older and had a diverse set of income sources (from investment etc) perhaps indicating a degree of financial knowledge/expertise hence our deduction that these are trust situations.
- [12] In explaining the more a typical household structures (see Table 3.1), the "other with no children" category is more likely to be a flatting/house sharing arrangements and"other, with children" may be a boarding arrangement or multiple families in one house.
- [13] There are some households that are not in any of these qualification categories as some qualifications are post school but not bachelor degrees (for example trade qualifications). The number of households in this category was small (5% of the sample).
4 Clusters#
4.1 The clustering technique#
Clustering techniques are commonly employed in applied data analysis, particularly marketing; an early survey of the use in this field is provided by Punj and Stewart (1983). The popularity of the approach in marketing is linked closely to the idea of market segmentation – the attempt to distinguish homogeneous groups of consumers who can be targeted in the same manner because they have similar characteristics and preferences. Given we are trying to establish a number of groups with broadly similar characteristics and preferences, this approach is attractive to us.
4.1.1 Dimensions for determining clusters
Punj and Stewart (1983) stress that the application of clustering techniques is not without its challenges. Reflecting on their meta-analysis of clustering studies, they suggest that attention to the dimensions used in determining the clusters is critical, as even one or two irrelevant dimensions may distort an otherwise useful analysis. They also state that there needs to be a rationale for inclusion, perhaps on the basis of theory or hypothesis. We start with the dimensions that we use to form our household types in Section 3 and supplement them with some additional dimensions that allow us to define our clusters more. The dimensions we use are age of highest income earner; number of children; qualification;[14] home ownership; household disposable income;[15] proportion of income from government transfer income (excluding Working for Families); proportion of income from private and public pensions; proportion of income from investments and proportion of income from private sources (excluding private pension and investments).
The first 4 dimensions seek to ensure that households have similar demographics and therefore their tastes and preferences are broadly similar. The level of disposable income is also included for this reason but also as a measure of how well the household can absorb shocks. Finally we look at the proportion of income that comes from different sources. First, this gives us the ability to create clusters with varying sensitivity to different shocks (for example, a financial/housing market shock will affect a cluster with a higher proportion of their income from investments). Second, the sources of income contain some demographic information; for example, we can distinguish between working and non-working older people by the percentage of their pension incomes relative to the percentage of wage and salary income.
4.1.2 The clustering algorithm and the distance function
Punj and Stewart (1983) identify three interrelated issues that need to be addressed when clustering. One is identifying the clustering algorithm that should be used; two is the measure of similarity between observations to use ("the distance measure"); three is how the data should be standardised. Punj and Stewart (1983) suggest that the choice of the distance function and standardisation method is not critical; hence we do not spend much time discussing our assumptions around these.
In terms of identifying the algorithm, there are two broad types: hierarchical methods and iterative partitioning methods. Put simply hierarchical methods of clustering either adopt a "bottom up" or "top down" approach. Under the "bottom up" approach the starting point is each observation in its own cluster, with pairs of clusters then merged (to a point) based on similarity. Under the "top down" approach, all observations start in one cluster and are split recursively. Iterative partitioning methods adopt a different approach, breaking the sample initially into a set number of clusters and then allocating each observation to the nearest cluster. The centre of the cluster is then iteratively moved to ensure the final positions of the clusters best fit the data. A critical difference between the methods is that the iterative partitioning method can reallocate an observation to a different cluster to better fit the data; this is not possible under the hierarchical method. On the basis of their meta-analysis of previous empirical studies, Punj and Stewart (1983) conclude that generally hierarchical methods are inferior to iterative partitioning methods, hence we adopt an iterative partitioning method.
In terms of a specific iterative partitioning algorithm to use, Punj and Stewart (1983) state that K-means (discussed below) is more robust (than other methods) in the presence of outliers, error perturbations in the distance measure and the choice of distance measure. Additionally it is not affected as much by irrelevant dimensions in determining the clusters. Owing to these reasons we use an algorithm based on K-means, but modified, as we discuss below, to deal with its weakness in the presence of random starting points.
K-means is a centre-based algorithm. The algorithm seeks to position the centre of the cluster by minimising the average distance from each of the observations in a given cluster to the cluster's centre. Closeness of any observation to the centre of a cluster(Mi) is measured by the distance measure. To calculate the distance measure for a given household, for each dimension d described above (for example age, income etc), we create an index:
where wd is the weight we assign to the importance of dimension d and (if the values that dimension can take are numeric) ad is the observed value of that dimension for the given household standardised to a value between 0 and 1 based on its percentile relative to all observed values of that variable in the dataset. For categorical variables where percentiles are meaningless (qualification and home ownership) we create a variable for each possible outcome and assign either a 0 or a 1 depending on whether or not the household meets that outcome and multiply it by the dimension's weight. In Appendix D we briefly review the literature around clustering techniques and categorical variables to explain why we have adopted this treatment of categorical variables.
For 1,2,...,d dimensions there is a vector:
that describes each household; there is also a vector Mi:
of the values of the index for each dimension d at the centre of cluster i. The distance measure for a given household j is:
this is then summed over all individuals who are in cluster i, defined by having Mi as the closest centre.
4.1.3 The clustering algorithm
Before outlining clustering algorithm one issue that needs to be addressed is the selection of the number of clusters. We select the number of clusters by looking at the marginal addition of adding an extra cluster to the RS measure of Sharma (1996). The RS measure quantifies between cluster heterogeneity, which we are looking to maximise. We select 12 clusters because adding more clusters than 12 sees close to zero addition to the between cluster heterogeneity measure, at the cost of decreasing the sample size in each cluster and thereby reducing the statistical robustness of the results. Appendix D provides more detail on the selection of number of clusters and the RS measure.
The algorithm that clusters the data is as follows:
- Select K random starting points M1, M2, ..., MK from the data (as discussed above we have set the number of clusters, K, to 12).
- For each random point Mi, find all observations that have Mi as the closest point, using the distance measure above.
- Replaces Mi with the centroid (mean) across the d dimensions of all the closest observations to Mi, this becomes the new Mi.
- Repeat steps 2 and 3 until no cluster centre M1, M2, ..., MK changes when the centroid is calculated; that is, no improvement can be made by taking the mean of the closest observations from the points initially assigned in step 2. This is the same as saying that no household changes its assigned cluster.
The form of the K-means algorithm we use is the K-harmonic means version. [16] Let:
be the distance measure that describes the distance between each observation X in the whole dataset Ω and all the centres M, summed across all observations. Specifically, the K-harmonic means minimises the following distance measure:
As can be seen with the inside summation over i this measure considers the distance from every observation X to the centre of every cluster Mi, compared to the K-means approach which only considers the distance of X to its nearest centre. The K-harmonic means approach then seeks to find K centres which minimises this distance function.
The clusters were created by applying this algorithm to the 2006/07 Household Economic Survey dataset. In order to track how these clusters have fared post recession we then applied the centres (ie the final vector of dimensions for each cluster) to the 2009/10 dataset. We discuss how the populations of the clusters changed between the two periods when we discuss the results in Section 5.2.3. Tests of how well the clusters fit the data are reported in Appendix D, including how the clusters would change if the algorithm was initially applied to 2009/10 dataset. One important point to note is that the K-means algorithm does not make any statistical assumptions about the distribution of the variables it is clustering on and as a result all observations are included in one of the clusters ie, no observations are excluded from a cluster altogether. One possible further extension of this work would be to make statistical assumptions around variable distributions; making it possible to test the statistical similarity of individual observations to the cluster centres and thus exclude observations that are not statistically similar to any cluster centre. Such an extension is beyond the scope of this paper, but represents an avenue for further research.
Notes
- [14] Based on the ordinal ranking system used in HES to rank qualifications from 0 (no qualification) to 8 (PhD); 5 is a bachelors degree, rather than the three categories outlined in Section 2.1. More details available on request.
- [15] Note we use disposable income rather than equivalised disposable income growth as number of children enters as another dimension.
- [16] Consistent with Punj and Stewart (1983), Zhang (2000) notes that the K-means method stands out, among the many clustering algorithms developed, as one of the most popular algorithms accepted by a range of applications but also the clusters it creates are very sensitive to the initial random values. The problem arises because the K-means approach minimises the distance from a data point to the closest centre. The K-harmonic means solves this problem by minimising the harmonic distance from the observations to all centres. The verification that this solves the initialisation problem is beyond the scope of this paper and the interested reader is referred to Zhang (2000) for its exposition.
4.2 The clusters#
4.2.1 Cluster descriptions#
This section outlines the 12 clusters created using the K-harmonic means. Figure 4.1 plots the created clusters by the age of the head of household, equivalised household disposable income and codes them by the percentage of the cluster that is a mortgage-holder. Table 4.1 and Table 4.2 provide some additional information on the clusters' income sources and demographic information (the reported numbers are arithmetic means for the cluster, except for the mortgage holders and single parents columns in Table 4.2 which are the percentage of the cluster that meet that criteria).
Cluster | Household disposable income($) |
Wage & Salary($) |
% of income from... | |||
---|---|---|---|---|---|---|
Transfers (exWFF) |
Investment Income |
Pension | Private sources [17] |
|||
A | 17,436 | 16,328 | 41 | 0 | 0 | 56 |
B | 20,146 | 33,694 | 25 | 1 | 1 | 73 |
C | 22,057 | 30,440 | 15 | 11 | 0 | 72 |
D | 12,435 | 7,137 | 45 | 4 | 0 | 44 |
E | 17,239 | 912 | 6 | 7 | 82 | 5 |
F | 28,702 | 35,397 | 4 | 4 | 0 | 92 |
G | 51,386 | 92,755 | 2 | 1 | 0 | 97 |
H | 21,835 | 4,956 | 4 | 11 | 72 | 13 |
I | 37,044 | 70,514 | 2 | 3 | 0 | 95 |
J | 43,306 | 69,709 | 2 | 3 | 0 | 96 |
K | 56,353 | 38,048 | 2 | 17 | 27 | 55 |
L | 65,599 | 91,394 | 1 | 6 | 0 | 93 |
- Figure 4.1: Clusters by age, equivalised household disposable income and mortgage status
-
- Source:
Cluster | Population (2006/07, 000) |
Age | Children | Average qualification score |
Mortgage holders (%) |
Single parent (%) |
---|---|---|---|---|---|---|
A | 122 | 32 | 0.7 | 0.9 | 9 | 36 |
B | 105 | 36 | 2.9 | 1.7 | 29 | 20 |
C | 110 | 34 | 0.6 | 5.0 | 22 | 11 |
D | 104 | 59 | 0.1 | 1.5 | 50 | 9 |
E | 172 | 74 | 0.1 | 0.4 | 67 | 3 |
F | 124 | 52 | 0.2 | 3.0 | 87 | 10 |
G | 165 | 30 | 0.3 | 4.6 | 39 | 2 |
H | 108 | 72 | 0.1 | 3.4 | 69 | 3 |
I | 145 | 42 | 2.4 | 4.6 | 72 | 3 |
J | 183 | 44 | 0.7 | 1.3 | 80 | 4 |
K | 67 | 65 | 0.1 | 3.8 | 80 | 8 |
L | 165 | 52 | 0.2 | 5.3 | 72 | 4 |
Punj and Stewart (1983) state the "the ultimate test of a set of clusters is its usefulness. Thus the producer of cluster analysis should provide a demonstration that clusters are related to variables other than those used to generate the solution" (p.146). In this spirit, we cross check our clusters with information not used to form the clusters to test their validity. We use data from two sources to do this cross check. First, we use additional information on average payments for specific types of transfer payment (Table 4.3).[18] For example we would expect a cluster with more children per household to have higher Working for Families payments. Our second cross check is to look at expenditure data from HES. In Table 4.4 we report selected items that are important to the houeshold budget. We index an individual cluster's budget share on a particular good to the average for all clusters. If a particular cluster's budget share on an item is the same as (above) average, that item has an index value of (above) 100. Using this approach we would, for example, expect clusters with a high proportion of renters to have a higher than average share of their budget devoted to rents and vice versa for mortgage holders.
Cluster | DPB ($) |
IB ($) |
SB ($) |
UB ($) |
FTC ($) |
IWTC ($) |
MFTC ($) |
PTC ($) |
NZS ($) |
---|---|---|---|---|---|---|---|---|---|
A | 3,277 | 1,064 | 480 | 511 | 2,360 | 414 | 10 | 15 | 152 |
B | 2,075 | 284 | 566 | 145 | 6,810 | 1,955 | 58 | 129 | 256 |
C | 775 | 781 | 107 | 235 | 1,621 | 603 | 21 | 96 | 79 |
D | 468 | 1,785 | 1,281 | 876 | 301 | 22 | - | - | - |
E | 270 | 93 | 51 | 121 | 179 | - | - | - | 15,058 |
F | 234 | 229 | 202 | - | 470 | 327 | 7 | - | - |
G | - | 140 | 187 | 78 | 218 | 202 | - | 8 | 16 |
H | 34 | 180 | 17 | 105 | 198 | - | - | - | 16,717 |
I | 179 | 30 | 7 | 43 | 1,437 | 1,286 | - | 38 | - |
J | 259 | 385 | 75 | - | 413 | 523 | - | 10 | 56 |
K | 62 | 134 | 99 | 66 | 325 | 151 | 34 | 11 | 12,806 |
L | 59 | 262 | 52 | 47 | 77 | 83 | - | - | - |
DPC = Domestic Purposes Benefit
UB = Unemployment Benefit
MFTC = Minimum tax credit
IB = Invalids Benefit
FTC = Family Tax Credit
PTC = Parental Tax Credit
SB = Sickness Benefit
IWTC = In work tax credit
NZS = New Zealand Superannuation
Cluster | Food excluding Restaurants |
Actual rental for housing |
Household energy |
Petrol | Mortgage Interest Payments |
---|---|---|---|---|---|
A | 113 | 426 | 135 | 111 | 25 |
B | 138 | 262 | 124 | 147 | 83 |
C | 95 | 268 | 103 | 114 | 80 |
D | 121 | 153 | 168 | 92 | 49 |
E | 135 | 82 | 199 | 85 | 11 |
F | 93 | 22 | 110 | 91 | 166 |
G | 82 | 182 | 66 | 99 | 175 |
H | 118 | 27 | 130 | 94 | 42 |
I | 100 | 26 | 86 | 87 | 206 |
J | 97 | 30 | 95 | 102 | 213 |
K | 100 | 27 | 95 | 87 | 37 |
L | 90 | 22 | 79 | 102 | 161 |
Ave. Budget share | 14% | 12% | 5% | 5% | 5% |
Clusters A and B are young low-income households. A receives a high proportion of income from transfers and receives higher than average unemployment benefit receipts. B has more children, more wage/salary income and lower benefit payments (outside those related to family assistance and the sickness benefit) than A. Consistent with B having the highest average number of children and a relatively low wage and salary income, B receives more from the family assistance benefit types (Family Tax Credits, FTC; and In Work Tax Credits, IWTC) than any other cluster. Both A and B are generally renting and relative to other clusters have a high proportion of single parent families (36% and 20% respectively); consistent with this they receive the highest average Domestic Purposes Benefit (DPB) payments. Cross checking these clusters against expenditure data shows, consistent with these clusters being mainly renters, rents in their budget share are over-represented and mortgage payments under-represented relative to all clusters. In line with the fact that households in B have a relatively high number of children on average, they spend relatively more on food and petrol.
D is the mid-to-later life beneficiary cluster, with relatively large average payments of unemployment (UB), invalid (IB) and sickness benefit (SB) and relatively low wage and salary income.
E, H and K are the older households. Households in cluster K are generally either working New Zealand Superannuation (NZS) recipients or nearing retirement, and have higher income and higher qualifications relative to the other older clusters and more diverse income streams (a higher proportion of their income is from investments). Households in cluster E are, on average, older than K, and appear to be fully retired, receiving over 80% of their income from pensions - the highest percentage of any cluster. Households in cluster H are, in a way, an intermediate cluster between K and H. They are of a similar average age to cluster E, but receive more income from wages/salary. This cluster may therefore be more likely to be doing some work (ie part time) in their retirement. E and H receive approximately the same equivalised disposable income, despite the fact that H has a higher disposable income. Looking into the data further reveals that about two-thirds of the households in cluster E are single person households as opposed to 25% in cluster H, hence the larger adjustment of H's disposable income when equivalised. Reflecting their relatively low income (as opposed to the other older cluster, K), E and H spend a higher proportion of their budget on the necessities of life: food and household energy.
Figure 4.1 shows that between 60% and 80% of households in clusters E, K and H are mortgage holders (or have fully repaid a mortgage). Given their life stage this may seem low, but structuring of their affairs into a trust/company structure (which is classified as 'other' in HES) may be biasing down this result. Additionally, these older clusters have a low budget share of mortgage payments (Table 4.4); this is consistent with their more advanced age giving them time to have paid their mortgage off.
There are two young highly qualified clusters C and G. Cluster G could be characterised as being a cluster of young well-paid professionals, with high equivalised disposable income reflecting their high salary and lack of children. Reflecting their high income, cluster G has a relatively low budget share of food and household energy. [19] Households in cluster C, although similarly qualified as G, receive lower wage and salary income. This may reflect the fact they are qualified in different areas than G, or have had trouble getting a well paying job despite their qualifications.
There are two middle aged mortgage holding clusters: I and J. Cluster I is more highly qualified, has higher average wage/salary, but has more children meaning their equivalised income is about the same as cluster J. Given households in both these clusters are generally mortgage holders, we see, as we would expect, mortgage payments over-represented in their budget shares.
Members of L are mid-life highly qualified high earners. It possibly represents the later life stage of G and I. Consistent with their status as high income earners, those in cluster L have high investment income, have a higher relative budget share on luxury items, such as international air travel, audio-visual and computing equipment and major cultural and recreational equipment, and a lower budget share on necessities (see Table 4.4). Households in Cluster F are, on average, roughly the same age as L, slightly less qualified and on lower incomes.
All in all cross checking our clusters against their budget shares of different items and composition of their transfer payments shows what we would expect to see and gives us some confidence in the clusters.
Notes
- [17] Excluding investment and private pension income.
- [18] We used the proportion of total transfers of disposable income to form our clusters but not information on the composition of benefits.
- [19] One interesting point is both mortgage payments and rent are over-represented in cluster G's budget share. This is because 39% of this cluster are mortgage holders, meaning this cluster is a mixture of mortgage holders and renters.
5 Household income and expenditure#
5.1 Metrics#
As discussed in Section 2, the first two indicators we use to examine the welfare changes over the recession are household disposable income and expenditure on goods and services. Based on Statistics N.Z. (1996) we also allocate the expenditure categories (see Table A.3 in Appendix A) from HES into durable or non-durable expenditure. [20] We are interested in changes over the recession in durable expenditure for three reasons. First, many durables are long lived therefore it is possible to delay their replacement in the face of income or wealth loss. Second, the slowdown in the housing market means the so-called "housing furnishing" channel [21]will be slower. Third, some durables are likely to be funded by credit; Reserve Bank data showed at an aggregate level annual household debt growth was 1.0% in December 2011 compared to around 13% in 2007. Therefore the 2008/09 recession may have seen some households voluntarily deleverage and/or other households face an involuntary reduction in credit (owing to a tightening in bank lending standards and the collapse of small finance companies).
5.2 Results#
We start with the results based on household types created by splitting the sample on hard dimensions. By contrasting our clustering results against the hard dimension results we are able to point out the advantages of using the clustering technique. All values reported in the tables are the weighted (by population weights) arithmetic mean values for the relevant group within each category.
Notes
- [20] Some are also classified as neither. This is because that expenditure category is a service or the expenditure category is a combination of durable/non-durable/service and it is therefore hard to allocate it into one of those groups. It is possible to break some of these durable/non-durable/service expenditure categories down further but this results in a large number of zero data points meaning any inference is likely to be questionable. This means the change in durable budget share plus change in non-durable budget share does not sum to zero.
- [21] Purchase of new housing goods when you move to a new house. Therefore lower turnover in the housing market means less of these purchases.
5.2.1 Hard dimension results: income and expenditure#
Looking at the split by age group in Table 5.1 there is relatively stagnant growth in disposable income in the youngest age group (less than general CPI inflation) compared with the older working age groups. This may owe to slow growth or falls in employment in the younger age groups during the recession. Figure 5.1 shows between the June years 2006/07 and 2009/10, employment in the under 25 category fell by 10% according to the Household Labour Force Survey (HLFS), significantly more than other age cohorts. Figure 5.1 also shows that the two older age groups, 55-64 and 65+ experienced the strongest employment growth – which is consistent with our finding that the two older age groups had the strongest disposable income growth.
Age group | Expenditure 06/07 ($) |
Expenditure 09/10 ($) |
Expenditure growth (%) |
Disposable Income 06/07 ($) |
Disposable Income 09/10 ($) |
Disposable Income growth (%) |
---|---|---|---|---|---|---|
47,003 | 48,752 | 4 | 57,171 | 58,025 | 1 | |
25-34 | 53,976 | 54,778 | 1 | 58,454 | 66,187 | 13 |
35-44 | 57,175 | 57,573 | 1 | 56,591 | 63,922 | 13 |
45-54 | 60,256 | 66,708 | 11 | 70,658 | 77,266 | 9 |
55-64 | 47,093 | 56,652 | 20 | 52,758 | 71,150 | 35 |
65+ | 29,300 | 32,050 | 9 | 31,726 | 41,871 | 32 |
- Figure 5.1: HLFS average employment growth by age group between 2006/07 and 2009/10
-
Comparing the 4 quartiles (see Table 5.2), the marginally stronger household disposable income growth in the lower quartile may reflect employment income growth[22] and/or the increase in transfers (mainly around family assistance) to lower income deciles over the period (see Figure 5.2). Growth in expenditure between the two periods was higher in the lower three equivalised income quartiles. At least for the lower two income quartiles this could reflect an increase in the price of non-durable necessities: there was an increase in the budget share of non-durables of 1.0 and 1.7 percentage points for quartile 1 and 2 respectively.[23] Given the fact that the non-durable necessities that increased in price relatively (for example, food) are likely to be a higher proportion of expenditure of those on lower income and, given these goods are relatively inelastic, we would expect total expenditure to increase for these income quartiles when non-durable necessity prices increase.
Equivalised income quartile |
Expenditure 06/07 ($) |
Expenditure 09/10 ($) |
Expenditure growth (%) |
Disposable Income 06/07 ($) |
Disposable Income 09/10 ($) |
Disposable Income growth (%) |
---|---|---|---|---|---|---|
1 | 24,686 | 28,004 | 13 | 17,942 | 22,185 | 24 |
2 | 40,551 | 44,125 | 9 | 37,563 | 45,094 | 20 |
3 | 53,371 | 58,246 | 9 | 56,030 | 65,893 | 18 |
4 | 77,158 | 81,323 | 5 | 101,477 | 119,285 | 18 |
- Figure 5.2: Transfers change (2006/07 to 2009/10), by decile, as % of 2006/07 average decile disposable income[24]
-
The family structure results also appear to reflect the growth in family assistance mentioned above. The two categories with children that have the lowest average income, 'single, with children' and 'other, with children', experienced the strongest disposable income growth. Another striking result is the large increase in the non-durables budget share of the 'other, with no children' (up 4.0%). This household type spends a large share of its budget on alcohol and restaurant/takeaway food, both of which increased in price. Their slow income and therefore expenditure growth has meant this age group may have been forced to trade off non-durables for durables as prices increased. This contrasts with the 'other with children' group, whose relatively strong disposable income growth has allowed them to increase their expenditure in the face of price increases of necessities meaning as a consequence the budget share of non-durables has had to increase by less (only increased 1.0 percentage points).
Family structure | Expenditure 06/07 ($) |
Expenditure 09/10 ($) |
Expenditure growth (%) |
Disposable Income 06/07 ($) |
Disposable Income 09/10 ($) |
Disposable Income growth (%) |
---|---|---|---|---|---|---|
Single, no children | 26,655 | 29,177 | 9 | 28,904 | 34,362 | 19 |
Single, children | 34,530 | 36,522 | 6 | 31,653 | 39,061 | 23 |
Couple, no children | 54,274 | 59,710 | 10 | 61,807 | 75,620 | 22 |
Couple, children | 64,187 | 69,067 | 8 | 65,244 | 73,410 | 13 |
Other, with no children | 64,026 | 61,721 | -4 | 78,657 | 81,517 | 4 |
Other, with children | 51,053 | 63,412 | 24 | 62,898 | 77,934 | 24 |
The family structure results also appear to reflect the growth in family assistance mentioned above. The two categories with children that have the lowest average income, 'single, with children' and 'other, with children', experienced the strongest disposable income growth. Another striking result is the large increase in the non-durables budget share of the 'other, with no children' (up 4.0%). This household type spends a large share of its budget on alcohol and restaurant/takeaway food, both of which increased in price. Their slow income and therefore expenditure growth has meant this age group may have been forced to trade off non-durables for durables as prices increased. This contrasts with the 'other with children' group, whose relatively strong disposable income growth has allowed them to increase their expenditure in the face of price increases of necessities meaning as a consequence the budget share of non-durables has had to increase by less (only increased 1.0 percentage points).
Table 5.4 shows there was stronger disposable income growth in the more qualified categories. This picture is consistent with full time equivalent (FTE) employment growth and earnings growth by industry shown in Table 5.5, with industries that may be considered to have more highly qualified people, such as professional, scientific, technical and support services; public administration, education and training; and health care and social assistance all experiencing relatively strong earnings and employment growth over the period.
Qualification | Expenditure 06/07 ($) |
Expenditure 09/10 ($) |
Expenditure growth (%) |
Disposable Income 06/07 ($) |
Disposable Income 09/10 ($) |
Disposable Income growth (%) |
---|---|---|---|---|---|---|
School or none | 41,943 | 45,792 | 9 | 45,910 | 53,454 | 16 |
Bachelor degree | 68,923 | 69,359 | 1 | 70,482 | 88,139 | 25 |
Post-graduate | 70,071 | 75,656 | 8 | 76,599 | 91,786 | 20 |
Industry | Average weekly earnings(%) [25] |
FTE employees (%)[26] |
---|---|---|
Forestry and Mining | 11 | 0 |
Manufacturing | 9 | -11 |
Electricity, Gas, Water and Waste Services | 12 | 4 |
Construction | 11 | -9 |
Wholesale Trade | 8 | -3 |
Retail Trade | 10 | -5 |
Accommodation and Food Services | 17 | -3 |
Transport, Postal and Warehousing | 13 | -3 |
Information Media and Telecommunications | 9 | -3 |
Financial and Insurance Services | 12 | -8 |
Rental, Hiring and Real Estate Services | 8 | 5 |
Professional, Scientific, Technical, Administrative and Support Services |
11 | 12 |
Public Administration and Safety | 12 | 7 |
Education and Training | 10 | 3 |
Health Care and Social Assistance | 17 | 7 |
Arts, Recreation and Other Services | 6 | 1 |
Total All Industries | 12 | -1 |
Notes
- [22]Quartile 1 had the strongest wage and/or salary income growth, 16%, relative to 14% for all quartiles. We cannot be definitive but this may reflect increased hours worked per household and/or wage increases. Part of these wage increases could reflect minimum wage changes. The adult minimum wage rose from $11.25 an hour to $12 an hour on 1 April 2008 which would have boosted incomes for some workers in the lower quartile.
- [23] Owing to space the budget share of durables and non-durables are not reported in the tables.
- [24] Includes Working for Families, NZS and Veterans Pension, Income replacement and Housing Supplement.
- [25] Growth between 2006/07 and 2009/10.
- [26] Growth between 2006/07 and 2009/10.
5.2.2 Hard dimension results: durables expenditure#
The fall in durables expenditure as a proportion of total expenditure was larger for non-renters (Figure 5.3). This fall could be consistent with any of the theories we put forward regarding how the recession could have impacted on durable expenditure. First, homeowners (mortgage holders or trust type arrangements) are more likely to buy durable products for their new houses, therefore the slowdown in the turnover in the housing stock may have slowed durables purchases. Second, homeowners are likely to have higher initial levels of debt in the form of mortgages, so either they have experienced a voluntary or involuntary slowdown in debt growth, and debt is primarily used to purchase durables. Third, the fall in wealth owing to falling house prices could reduce or delay discretionary spending, some of which is likely to be durable.
- Figure 5.3: Durable expenditure by home ownership status [27]
-
Figure 5.4 shows durable expenditure has fallen the most as a percentage of their budget share for the two middle-aged age groups: 35 to 44 and 45 to 54, and people in the older (65+) age group. Smith (2007) reports older households have relatively low levels of mortgage debt but high home ownership, and are more likely to trade down into cheaper housing and spend the equity. The relatively large fall in durables in the budget share in the older age group may reflect less churn in the market making downsizing harder and therefore older households not being either able to release equity in their house to fund their spending or alternatively, having less demand for new durables (through the housing furnishing channel) as they stay in their existing house. In terms of these competing explanations, it is interesting to note that for the 65+ age group average expenditure and income were similar in the 2006/07 year but average expenditure was significantly less than income in the 2009/10 year. This may indicate that nervousness about being able to access their housing equity meant older households are starting to save more.[28]
- Figure 5.4: Durable expenditure by age
-
Identifying the cause of the large fall in budget shares of durables of the 35 to 55 age group is more difficult. They are the age group more likely to be trading up in terms of housing, therefore the fall in durables may reflect less churn in the housing market slowing the housing furnishing channel. Alternatively this age group is likely to have a lot of debt in the form of mortgages (in 2009/10 they spent 7.2% of their income on mortgage payments, 1.0 percentage point more than the next highest age group), therefore they may have been more precautionary in light of this, especially in the presence of house price falls decreasing their ratio of assets to debt. Consistent with the final explanation, Smith (2007), using HES data, found for this age group that even non-housing expenditure growth is more correlated with the average house price than other age groups. We also find this, with Figure 5.4 showing that those in the 35-44 and 45-54 age groups had the largest fall in budget share of non-housing related durables.
Looking at the split by income quartiles supports the hypothesis that housing market gearing may be having a role in reducing durables expenditure for the middle aged groups (Figure 5.5). Kida (2009), also using HES data, found that higher income households tended to be more highly geared, with the third to fifth quintiles having the highest ratios of outstanding mortgage debt to home value, with the fourth quintile being the most highly geared. Kida (2009) suggests that this makes them the most exposed to risk from falling house prices. Given the biggest drop in durables (especially non-housing durables) was in quartile 3 (followed by 4 and 2), and given the fourth quintile would most likely be in our third quartile,[29] this may (and we emphasise may as this analysis is indicative only, see comments below) mean that these households have voluntarily reduced their debt and therefore durable expenditure in light of this vulnerability. Given the earnings cycle means that those in the 35 to 55 age are more likely to be in the upper income quartiles, this may explain why this age group has seen a large drop in durables in their budget share.
- Figure 5.5: Durable expenditure by income quartile
-
We caution that the falls in durables described above should not be taken as estimates of the impact of housing debt on different households, or even a formal test of their presence. Differences we observe across home ownership status, income quartiles and age groups could be owing to reasons other than precaution in the face of gearing: other plausible explanations include different shocks to other income sources (for example, employment) or wealth effects from other non-housing assets.
Notes
- [27] Housing durables includes materials for property improvements, furniture, furnishings and floor coverings, textiles and appliances.
- [28] This may also be owing to the different timing of income and expenditure measurement as discussed in Section 2.2.
- [29] Indeed breaking the change in durables down by deciles confirms this with deciles 7 and 8 (quintile 4) having the second and third largest falls in durables expenditure as a proportion of budget share.
5.2.3 Cluster results#
Table 5.6 presents the population count for each cluster in both 2006/07 and 2009/10. The following results stand out: the numbers of households in cluster K, the working older cluster, has shown significant growth (30%) between 2006/07 and 2009/10, reflecting the strong growth in employment in the 60+ age groups. Whilst acknowledging the sampling issues with older age groups in Household Labour Force Survey (HLFS), according to HLFS, between June quarter 2008 and June quarter 2010 employment in the 60-64 and 65+ age groups grew 23% and 27% respectively (the corresponding population growth in those age groups was 11% and 8% over the same time period); E the predominately 'retired' cluster, held about constant in numbers of households, whilst the cluster H, retirees with some wage income grew 7% - a direction consistent with employment in the 65+ age group in the HLFS.
Two of the three predominately renting younger clusters (A and C) also grew strongly, at the expense of I and J - the predominately young mortgage holding clusters. This perhaps indicates either that the tightening of lending standards by banks during the recession has impacted on the ability of younger households to get into the housing market, or alternatively there has been a reluctance on the behalf of younger households to take on mortgage debt to buy a house. Another interesting result is the growth of C but the decline in G. Remember that households in C and G were on average similarly aged and qualified but had different incomes. The decline in membership of cluster G at the expense of C may reflect the recession has made it harder for new graduates to find well paying jobs and hence more are in the lower income cluster.
Cluster | 2006/07 ('000) |
2009/10 ('000) |
Growth (%) |
---|---|---|---|
A | 122 | 156 | 28 |
B | 105 | 104 | 0 |
C | 110 | 132 | 20 |
D | 104 | 103 | -2 |
E | 172 | 172 | 0 |
F | 124 | 138 | 11 |
G | 165 | 159 | -4 |
H | 108 | 115 | 7 |
I | 145 | 138 | -5 |
J | 183 | 163 | -11 |
K | 67 | 87 | 30 |
L | 165 | 158 | -4 |
In addition to being the fastest-growing cluster, Table 5.7 shows that amongst the clusters the second strongest average income growth was recorded by Cluster K (33%); again this, and the 22% income growth of cluster H (retirees with some wage income), most probably reflects the strong increase in older age employment referred to earlier.[30] Disposable income growth was 18% for the other older cluster E; this illustrates the advantage of our approach: the outcomes over the recession for older households were different depending on whether they were more likely to be working.
Cluster | Expenditure 06/07 |
Expenditure 09/10 |
Expenditure growth (%) |
Disposable Income 06/07 |
Disposable Income 09/10 |
Disposable growth (%) |
Δ budget share of durables |
Δ budget share of non-durables |
---|---|---|---|---|---|---|---|---|
A | 29,141 | 31,705 | 9 | 24,712 | 31,406 | 27 | -2.1 | 1.5 |
B | 40,162 | 46,838 | 17 | 44,139 | 50,826 | 15 | -3.4 | 2.6 |
C | 43,017 | 45,475 | 6 | 32,301 | 38,082 | 18 | -2.4 | 1.1 |
D | 21,823 | 25,934 | 19 | 14,062 | 19,109 | 36 | -4.5 | 1.9 |
E | 19,417 | 21,137 | 9 | 20,230 | 23,913 | 18 | -4.4 | 2.4 |
F | 43,690 | 47,574 | 9 | 36,280 | 43,932 | 21 | -1.1 | 2.3 |
G | 66,849 | 70,738 | 6 | 82,616 | 98,273 | 19 | -0.4 | 1.4 |
H | 33,844 | 34,487 | 2 | 29,165 | 35,641 | 22 | -2.2 | 4.3 |
I | 78,454 | 85,485 | 9 | 78,259 | 97,016 | 24 | -2.6 | 2.2 |
J | 59,606 | 68,085 | 14 | 70,721 | 83,381 | 18 | -3.1 | 0.6 |
K | 56,499 | 64,952 | 15 | 81,565 | 108,165 | 33 | -5.7 | 2.6 |
L | 80,961 | 83,541 | 3 | 104,893 | 116,815 | 11 | -4.2 | -0.2 |
The other two clusters that experienced relatively strong disposable income growth were A and D, both dependent on government transfers and the minimum wage, which as we saw above increased during the period. B and D had the highest growth in expenditure. In the case of D, this increased expenditure appears to be going on increased mortgage payments (with mortgage payments increasing 1.4 percentage points in this cluster's budget share over the period) despite the fall in the effective mortgage rate indicating that households in this cluster are trying to pay down debt. Another interesting point is for three out of four of the very poorest clusters A, C and D, their expenditure outweighs their disposable income in both years (see Table 5.7), but, over the period income growth was stronger than expenditure growth meaning, all else equal, their level of dissaving fell.[31]
Figure 5.6 plots the reduction of the budget share of durables against the percentage of the cluster that is a mortgage holder and we see the slope is negative. Section 5.2.2 outlined some of the reasons why we would expect such a slope in the presence of a slowing housing market, these include the wealth effects of falling house prices, mortgage holders having higher debt levels and less of a housing furnishing channel. Figure 5.6 shows that the reduction in budget share of durables is large for L, K, E and D, even after accounting for their large percentage of mortgage holders. L and K respectively received around $1700 and $1500 from housing investments in 2006/07, significantly higher than the next highest cluster I, which received $800. This means they are likely to be highly geared with respect to the housing market and therefore more sensitive to house price falls; this is consistent with the story we told in the previous section. D cut back on their durables expenditure significantly also and, as we noted above, significantly increased the share of their budget devoted to mortgage payments. This cluster is a low income, lowly qualified, mortgage holding cluster, therefore their reduction in durables to possibly fund their increased mortgage payments to pay off debt faster may be more precautionary (especially given the relative performance of qualified versus non-qualified jobs over the period we discussed before). Cluster E (older non-workers) also reduced their budget share of durables by a lot relative to their percentage of mortgage holders - again this may reflect a lower prevalence of active housing equity withdrawal (or lack of access to credit for older age groups) meaning these households are either delaying or stopping discretionary spending on durable goods. This is opposed to the other older cluster H, which still, on average, has some wage income and therefore may not be so reliant on housing equity to support durable expenditure.
- Figure 5.6: Mortgage holding vs reduction in durables in budget share
-
Notes
- [30] To explain this with an example, those in cluster K are most probably working but not necessarily, ie, some of the cluster will not be working but are very similar on other characteristics to those who are. Therefore the increase in average income of cluster K could be because a higher proportion of cluster K begin to work and thus lifts the average income.
- [31] This may also be owing to the different timing of income and expenditure measurement as discussed in Section 2.2.
6 The welfare effects of price changes#
In the first parts of this section we measure the welfare effects of the price changes that occurred over the recession. Whilst not all price changes are directly attributable to the recession, in the period between 2006/07 and 2009/10 there were large price changes in expenditure categories that typically receive a large share of the household budget; namely there were large increases in food and fuel prices, insurance, energy, local authority rates and rents, whilst there were falls in mortgage rates. The varying proportion of these goods in different households expenditure bundles will therefore generate different welfare changes amongst households.[32]
In Subsection 6.4 we aggregate the welfare changes owing to both expenditure and price changes to examine the total welfare effects of the recession for the different household types.
6.1 Equivalent variation as a measure of welfare#
We follow the approach of Creedy (1998), who uses the Linear Expenditure System (LES) to derive the equivalent variation (EV) measure. This approach explicitly assumes preference heterogeneity between household types and clusters, and assumes households within the same group (where household types are split on hard dimensions) or the same cluster have the same preferences. This assumption motivated the first part of this paper: establishing household types that could be assumed to be internally homogeneous but heterogeneous from each other.[33]
The technical details of the EV measure and how it is derived are available in Appendix E. The equivalent variation can be expressed in terms of the expenditure function (E(.,.)) as:[34]
p0 and p1 are the old and new prices respectively, and U0 is the new utility level post price changes. Equivalent variation is the maximum amount the individual would be prepared to pay, in the presence of new prices, to return to the old prices and hence can be thought of as the welfare loss associated with price changes in the economy. We also normalise the EV by 2006/07 expenditure to examine the proportionate change in EV relative to total spending (hereafter EV/Exp) and get a sense of how progressive or regressive the welfare impact of the price changes are.
Notes
- [32] Table A.3 in Appendix A shows the price changes (as measured by the CPI) over the period by expenditure component.
- [33]The use of a LES does give rise to potential well known problems with additivity (see Deaton (1974)), although the level of aggregation we generally use on the expenditure groups may mean these issues are less severe.
- [34]The expenditure function is the minimum expenditure required to reach a level of utility at current prices.
6 Welfare effects of price changes (continued)#
6.2 Results by hard dimension household types#
Table 6.1 shows that equivalent variation normalised by 2006/07 expenditure (EV/Exp) was higher for renters relative to non-renters (mortgage holders and 'other' - 'other' typically being trust type arrangements). This reflects, the fact that rents relative to mortgage interest rates rose over the period of analysis and renters spend a larger proportion of their expenditure on goods that increased in price (food, fuel and household energy for example), so these price changes affect them proportionally more. The absolute level of EV is roughly similar between the renters and mortgage holders, but significantly higher for the 'other' group.
Up to the 44-54 age group, the absolute level of EV rises with age (apart from the younger than 25 age group) reflecting the correlation of age with income to that point (and therefore absolute expenditure on goods that increased in price). Looking at EV/Exp, the story is different - with the youngest and two older age groups having the highest EV as a proportion of expenditure. This reflects their lower incomes and therefore food, fuel and household energy being a higher proportion of their expenditure bundle. In addition the very young are likely to be renters, as opposed to mortgage holders, whilst those in the older age groups are likely to have paid their mortgage off meaning mortgage payments are not a large proportion of their expenditure.
EV ($) | EV/Exp (%) | |
---|---|---|
Home Ownership status | ||
Renters | 2,816 | 5.4 |
Mortgage holders | 2,999 | 4.4 |
Other | 3,832 | 4.6 |
Age group | ||
3,040 | 4.7 | |
25-34 | 2,740 | 3.9 |
35-44 | 2,744 | 4.1 |
44-54 | 3,730 | 4.6 |
55-64 | 3,685 | 5.2 |
65+ | 2,253 | 5.5 |
Qualification | ||
School or none | 2,715 | 5.0 |
Bachelor degree | 3,578 | 3.9 |
Post-graduate | 3,814 | 4.5 |
Equivalised income quartile | ||
YQ1 | 1,953 | 8.8 |
YQ2 | 2,584 | 5.7 |
YQ3 | 2,966 | 4.6 |
YQ4 | 4,105 | 3.4 |
Family structure | ||
Single with no children | 1,762 | 5.0 |
Single, children | 1,611 | 5.5 |
Couple, no children | 3,612 | 4.5 |
Couple, children | 3,635 | 4.7 |
Other, no children | 3,450 | 4.0 |
Other, children | 3,888 | 4.8 |
Other interesting results are the income quartile and family structure results. The goods which experienced large relative price increases (food, fuel and household energy) represent a larger absolute amount of the expenditure bundle of those in higher income quartiles but a smaller proportion of their expenditure. Therefore the absolute welfare loss owing to relative price changes is larger for households in higher income quartiles but lower as a percentage of expenditure. Single-parent households have the highest EV/Exp of the family structure groups, reflecting the high proportion of necessities that increased in price in their budget share owing to their lower incomes and children. In fact all households with children have a higher EV/Exp than the corresponding household without children (eg Couple with children has a higher EV/Exp than Couple without children). This reflects the high spending of households with children on food, energy and petrol relative to total expenditure.
6.3 Results by cluster#
Figure 6.1 shows that EV increases with the total expenditure of clusters, reflecting the fact that clusters with higher expenditure spend higher absolute amounts on goods that increased in price. However once we normalise EV by expenditure (Figure 6.2), the trend reverses because the goods which increased in price are a higher proportionate amount of the expenditure bundle of those with low expenditure.
- Figure 6.1: Equivalent variation versus 2007 expenditure
-
- Figure 6.2: Equivalent variation / 2007 expenditure versus 2007 expenditure
-
Figure 6.3 plots EV/Exp against the average age of the cluster. This, we believe, illustrates the strength of our approach. Clusters A, B, C and G all have approximately the same average age but have markedly different EV normalised by expenditure. The less well off clusters (A, B and C), which spent a higher proportion of their expenditure on necessities are relatively more affected compared with cluster G. G was also less affected as it has a higher proportion of mortgage holders, and mortgage rates decreased relative to rents. Even amongst clusters A, B and C there are differences in EV, with cluster B being the most affected by price increases. This reflects the fact that cluster B has the highest average number of children meaning, as Table 4.4 shows, this cluster spends a significantly higher share of their budget on food and petrol - which were amongst the items with the largest increases. Using the hard dimension approach, EV/Exp is 4.6% for the 25-34 age group but, as we have shown, this number hides a large heterogeneity in the range of outcome for people in that age group depending on their different attributes (for example, income, home ownership status and children) and thereby shows the advantages of our clustering approach.
- Figure 6.3: EV normalised by 2007 expenditure versus age
-
6.4 Aggregate welfare effects#
Creedy (2004) shows that the equivalent variation measure of welfare effects owing to both price changes and expenditure changes[35] can be aggregated to calculate an overall welfare effect. Table 6.2[36] presents the results by our hard dimension groups. Younger age groups were worse off over the recession, with the welfare gain owing to increased expenditure more than offset by the welfare loss owing to price changes; conversely the 55-64 year old age group had the biggest welfare improvement, reflecting their strong growth in expenditure brought about by their increased employment over the period. Both expenditure and welfare lost owing to price changes increases with income quartile, although the gains to welfare from expenditure increases outweigh the losses from price increases for the first three income quartiles (and are largest for lowest income quartile when expressed as a percentage of 2006/07 expenditure).
EV - Expenditure | EV - Price | Aggregate EV | % of 2006/7 Expenditure | |
---|---|---|---|---|
Home ownership status | ||||
Renters | 3,646 | -2,816 | 830 | 2% |
Mortgage holders | 3,880 | -2,999 | 881 | 2% |
Other | 3,532 | -3,832 | -300 | 0% |
Age group | ||||
1,750 | -3,040 | -1,290 | -3% | |
25-34 | 803 | -2,740 | -1,937 | -4% |
35-44 | 398 | -2,744 | -2,346 | -4% |
44-54 | 6,453 | -3,730 | 2,723 | 5% |
55-64 | 9,559 | -3,685 | 5,874 | 12% |
65+ | 2,750 | -2,253 | 497 | 2% |
Income quartile | ||||
YQ1 | 3,318 | -1,953 | 1,365 | 6% |
YQ2 | 3,574 | -2,584 | 990 | 2% |
YQ3 | 4,875 | -2,966 | 1,909 | 4% |
YQ4 | 4,165 | -4,105 | 60 | 0% |
Qualification | ||||
School or none | 3,850 | -2,715 | 1,135 | 3% |
Bachelor degree | 436 | -3,578 | -3,142 | -5% |
Post-graduate | 5,585 | -3,814 | 1,771 | 3% |
Family structure | ||||
Single, no children | 2,522 | -1,762 | 760 | 3% |
Single, children | 1,992 | -1,611 | 381 | 1% |
Couple, no children | 5,440 | -3,612 | 1,828 | 3% |
Couple, children | 4,880 | -3,635 | 1,245 | 2% |
Other, no children | -2,305 | -3,450 | -5,755 | -9% |
Other, children | 12,359 | -3,888 | 8,471 | 17% |
Table 6.3 shows the results by cluster. Clusters J, K, B and D had significant welfare improvements over the recession (when their aggregate EV is normalised by 2006/07 expenditure). From Figure 6.2 clusters B, D and K all had large welfare losses as a percentage of total expenditure from price changes, so therefore their welfare gain results from large increases in expenditure. Cluster J's improving welfare on-the-other-hand, appears to be a function
Cluster | EV - expenditure | EV - Price | Aggregate EV | % of 2006/7 expenditure |
---|---|---|---|---|
A | 2,563 | -2,039 | 524 | 2% |
B | 6,676 | -3,247 | 3,429 | 9% |
C | 2,458 | -2,525 | -66 | 0% |
D | 4,111 | -1,890 | 2,222 | 10% |
E | 1,720 | -1,590 | 130 | 1% |
F | 3,884 | -2,943 | 941 | 2% |
G | 3,889 | -3,547 | 342 | 1% |
H | 643 | -2,393 | -1,750 | -5% |
I | 7,031 | -4,242 | 2,789 | 4% |
J | 8,479 | -3,503 | 4,975 | 8% |
K | 8,453 | -4,363 | 4,089 | 7% |
L | 2,580 | -4,525 | -1,946 | -2% |
Table 6.3 shows the results by cluster. Clusters J, K, B and D had significant welfare improvements over the recession (when their aggregate EV is normalised by 2006/07 expenditure). From Figure 6.2 clusters B, D and K all had large welfare losses as a percentage of total expenditure from price changes, so therefore their welfare gain results from large increases in expenditure. Clusters D and K both experienced strong income growth, with growth in their expenditure still being less than income growth. Conversely cluster B's income growth was slightly less than their expenditure growth. Cluster J's improving welfare on-the-other-hand, appears to be a function of moderate welfare losses from price changes and moderate gains in expenditure.
Notes
- [35] The equivalent variation of the welfare change owing to expenditure change is simply equivalent to the dollar value of the change in expenditure.
- [36] In Tables 6.2 and 6.3 we have changed the sign on the EV metric for price changes to negative to indicate that price increases detract from welfare.
7 Conclusion#
This paper offers two contributions. One is to quantify welfare changes between 2006/07 and 2009/10 for different types of households in New Zealand; a period that included the 2008/09 recession. Given the data available, and given the effects of the recession are likely to be ongoing, we have only quantified some initial impacts. We highlighted one often overlooked channel through which there can be variations in welfare changes for different household types: price changes. We found those in the low income groups, those with children and/or those who rented had large welfare losses owing to price changes. The relatively large impact on low income groups and those with children reflects that goods that increased in price were generally a larger part of their expenditure bundle; whilst the larger welfare impact on renters versus homeowners reflects rents rising relative to mortgage rates. For those in lower income groups, these welfare losses owing to price changes were more than offset by strong expenditure growth. However it is intuitively clear that these groups are still worse off than if there had been no recession.
The second contribution of this paper is the application of clustering to form the household types. The advantage of clustering techniques is it allows us to follow household types through time that are more 'similar' on a number of dimensions than can be achieved with groups split on one hard dimension. This allows us to pick up differences within certain groups that may otherwise be missed. For example we created three older clusters – crudely one that is working, one that is working part-time and one were people are retired – and showed that in the time period studied the older working and part-time working clusters grew in number and experienced strong income growth, whilst the non-working one did not. By differentiating the younger age group by home ownership status, qualification and income level, our clusters showed that over the recession there was, first, a shift towards renting from home ownership in the younger age group – perhaps reflecting a reluctance to take on debt or tightening of lending standards by banks. Second, it was harder for younger qualified people to find high paying jobs. Third, we were able to show, by differentiating households on their exposure to the housing market, that highly geared households and households with large mortgage payments relative to income reduced their durable expenditure, perhaps, indicating a desire to reduce debt. WP13/05
References#
Creedy, J. (1998). Measuring the welfare effects of price changes: A convenient parametric approach. Australian Economic Papers, 37(2):137-51.
Creedy, J. (2004). The excess burden of taxation. Australian EconomicPapers, 37(4):454-464.
Creedy, J. and Sleeman, C. (2006). The Distribution Effects of Indirect Taxes: Models and Applications from New Zealand. Cheltenham: Edward Elgar, UK.
Deaton, A. (1974). A reconsideration of the empirical implications of additive preferences. Economic Journal, 84:338-348.
Huang, Z. (1998). Extensions to the K-means algorithm for clustering large datasets with categorical values. Data mining and knowledge discovery, 2:283-304.
Kida, M. (2009). Financial vulnerability of mortgage-indebted households. Reserve Bank of New Zealand Bulletin, 72(1).
Punj, G. and Stewart, D. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20:134-148.
Ralambondrainy, H. (1995). A conceptual version of the K-means algorithm. Pattern Recognition Letters, 16:1147-1157.
Sharma, S. (1996). Applied multivariate techniques. Wiley and Son, New York.
Smith, M. (2007). Microeconomic analysis of household expenditures and their relationship with house prices. Reserve Bank of New Zealand Bulletin, 70(4).
Statistics N.Z. (1996). Quarterly Gross Domestic Product: Sources and Methods. Statistics N.Z.
Statistics N.Z. (2001). The introduction of integrated weighting to the 2000/2001 household economic survey. Statistics N.Z.
Zhang, B. (2000). Generalized K-harmonic Means - boosting in unsupervised learning, hpl-2000-137 edition.
Appendix A Additional tables#
Household size | Equivalence scales | ||||
---|---|---|---|---|---|
Per-capita income | "Oxford" scale ("Old OECD scale") |
"OECD-modified" scale | Square root scale | Household income | |
1 adult | 1 | 1 | 1 | 1 | 1 |
2 adults | 2 | 1.7 | 1.5 | 1.4 | 1 |
2 adults, 1 child | 3 | 2.2 | 1.8 | 1.7 | 1 |
2 adults, 2 children | 4 | 2.7 | 2.1 | 2.0 | 1 |
2 adults, 3 children | 5 | 3.2 | 2.4 | 2.2 | 1 |
Elasticity[37] | 1 | 0.73 | 0.53 | 0.50 | 0 |
Source: OECD: Adjusting Household Incomes: Equivalence Scales[38]
Demographic dimension | Weight |
---|---|
Age of Primary Income Earner | 700 |
Equivalised Household Disposable Income | 800 |
Number of Children | 150 |
Qualification | 75 |
Household Ownership | 200 |
Proportion of Other Govt exc WfF income | 200 |
Proportion of Investment income | 200 |
Proportion of Pension related income | 650 |
Proportion of private income | 150 |
Price changes (2006/07 to 2009/10) |
|
---|---|
Food exc Restaurants (nd) | 14.59% |
Restaurant (nd) | 11.82% |
Alcoholic beverages (nd) | 8.96% |
Cigarettes and tobacco (nd) | 10.75% |
Clothing | 3.25% |
Footwear | 0.51% |
Actual rentals for housing | 6.39% |
Purchase of housing | -1.20% |
Materials for property alterations, additions and improvements (d) | 8.49% |
Services for property alterations, additions and improvements | 8.79% |
Property maintenance | 8.65% |
Property rates and related services | 16.43% |
Household energy (nd) | 14.07% |
Furniture, furnishings and floor coverings (d) | -2.78% |
Household textiles (d) | 2.55% |
Household appliances (d) | 3.10% |
Purchase of vehicles (d) | 5.60% |
Petrol (nd) | 10.66% |
Domestic air transport | -9.03% |
International air transport | -3.05% |
Telecommunication equipment (d) | -70.71% |
Telecommunication services | -0.58% |
Audio-visual and computing equipment (d) | -57.22% |
Major recreational and cultural equipment (d) | 8.62% |
Other recreational equipment and supplies | 6.79% |
Recreational and cultural services | 7.19% |
Newspapers, books and stationery (nd) | 13.04% |
Accommodation services | 5.71% |
Package holidays | 6.01% |
Miscellaneous domestic holiday costs | 8.46% |
Interest payments on personal loans | -6.33% |
Interest payments on credit sales (hire purchases) | -6.33% |
Health | 6.90% |
Insurance | 12.60% |
Other interest payments | -6.33% |
(d) - Durable (nd) - Non-durable
The "price" of mortgage payments were based on changes in the RBNZ effective mortgage rate
The large falls in Telecommunication and Audio-visual and computing equipment prices are owing to Statistics New Zealand adjusting the price changes in these items for quality improvements. Source: Statistics New Zealand (Consumers Price Index)
Notes
- [37] Using household size as the determinant, equivalence scales can be expressed though an "equivalence elasticity", ie the power by which economic needs change with household size. The equivalence elasticity can range from 0 (when unadjusted household disposable income is taken as the income measure) to 1 (when per capita household income is used); the smaller the value for this elasticity, the higher the assumed economies of scale inconsumption.
- [38] Available at www.oecd.org/dataoecd/61/52/35411111.pdf[accessed 23 May2012].
Appendix B Sensitivity to weight selection#
There is relatively little guidance on selecting the appropriate weights to put on each dimension in the distance function outlined in Section 4.1.3; especially when using the K-harmonic means approach in the presence of categorical variables. In forming the weights we started by assuming uniform weights on all the variables. Initial investigation of the results showed that such an approach put too much weight on the categorical variables (home ownership and qualification); that is, clusters were being formed primarily on these categorical variables rather than the other numeric data. To counter this we then started increasing the relative weights on dimensions which we thought are important in terms of describing household characteristics (primarily income and age) until we got clusters that were reasonably robust to small changes in relative weights. The additional dimensions were then added with a lower weight to help us refine the clusters (ie, distinguish between clusters of similar age and income).
To look at how sensitive our membership of the clusters are to changing relative weights we look at what happens when we zero weight income, thereby creating new relative weights (see Table B.1). Zero weighting income also allows us to address a potential criticism of our approach. This criticism is potentially owing to social mobility, there are likely to be changes in where people sit in the income distribution, meaning that between the two time periods studied the demographics (in terms of age, qualification etc) of the income earner in any given percentile income is not likely to be the same over the two time periods. This potentially opens us to the criticism that we are not really tracking 'like' people through time in terms of demographics and we are really just tracking people with 'like' incomes through time.
Demographic Dimension | Original | New |
---|---|---|
Age of Primary Income Earner | 22% | 30% |
Household Disposable Income | 26% | - |
Number of Children | 5% | 6% |
Qualification | 2% | 3% |
Household Ownership | 6% | 9% |
Proportion of government transfers (Ex WfF) | 6% | 9% |
Proportion of Investment income | 6% | 9% |
Proportion of Pension related income | 21% | 28% |
Proportion of private income | 5% | 6% |
A useful device to compare the result of zero weighting income is the transition matrix, presented in Table B.2. Table B.2 shows the percentage of the original cluster that ended up in the new clusters (on the y axis). The results are encouraging, all clusters maintain between 97% and 100% of their membership, which given we zero weighted the dimension with the largest weight, gives us a reasonable degree of confidence in the stability of clusters to weight selection, and that when we track clusters through time we are tracking people of 'like' demographics. Given so many of our variables are highly correlated: age, percentage of income from pensions and home ownership status for example, different relative weights at the margin should not generate radically affect the cluster membership. This is because we minimise the distance function on many dimensions, so for an older household for example, the distance between them and their cluster centre for age, home ownership status and the proportion of income from pensions is going to be small on each dimension, therefore changing the relative weights on these dimensions is not going to materially change the composition of the cluster.
Original clusters | Clusters excluding income dimensions | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
I | J | G | K | D | F | C | H | L | A | B | E | |
I | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
J | 2% | 97% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
G | 0% | 0% | 99% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
K | 0% | 0% | 0% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
D | 0% | 0% | 0% | 0% | 98% | 2% | 0% | 0% | 0% | 0% | 0% | 0% |
F | 0% | 2% | 0% | 0% | 0% | 97% | 0% | 0% | 0% | 0% | 0% | 0% |
C | 0% | 0% | 0% | 0% | 0% | 2% | 98% | 0% | 0% | 1% | 0% | 0% |
H | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
L | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 99% | 0% | 0% | 0% |
A | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 100% | 0% | 0% |
B | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 100% | 0% |
E | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 100% |
Our second robustness check is to cluster the 2009/10 dataset initially (ie, apply the algorithm to the 2009/10 dataset) and then compare these clusters to the 2009/10 clusters created under our original methodology. Table B.3 presents the results below. With the exception of cluster L, clusters generally retain between 88% and 100% of their membership, which is encouraging in terms of satisfying us of the cluster's robustness. Cluster L maintains two-thirds of its membership, still relatively high, but it does mean that relative to other clusters, we need to caution against attaching too much significance to this cluster's results.
2009/10 HES Clusters | Original clusters | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
K | J | F | I | E | L | C | H | G | B | A | D | |
K | 98% | 0% | 0% | 1% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
J | 0% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
F | 0% | 0% | 97% | 0% | 0% | 0% | 1% | 1% | 0% | 0% | 0% | 0% |
I | 1% | 0% | 2% | 94% | 0% | 1% | 0% | 0% | 0% | 0% | 0% | 1% |
E | 0% | 0% | 0% | 0% | 93% | 1% | 0% | 0% | 0% | 0% | 0% | 6% |
L | 0% | 0% | 0% | 0% | 1% | 67% | 23% | 0% | 0% | 3% | 0% | 6% |
C | 0% | 0% | 0% | 0% | 0% | 1% | 88% | 0% | 0% | 10% | 0% | 0% |
H | 0% | 0% | 0% | 0% | 0% | 0% | 12% | 87% | 0% | 0% | 0% | 0% |
G | 0% | 3% | 0% | 0% | 0% | 0% | 0% | 0% | 97% | 0% | 0% | 0% |
B | 0% | 0% | 0% | 0% | 2% | 5% | 0% | 0% | 0% | 92% | 0% | 0% |
A | 0% | 0% | 4% | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 88% | 0% |
D | 0% | 0% | 3% | 0% | 0% | 0% | 3% | 4% | 0% | 0% | 0% | 89% |
Appendix C Categorical variables and clustering algorithms#
Huang (1998) states that standard hierarchical clustering methods can handle data with numeric and categorical values. However, this author notes that the computational cost makes them unacceptable for clustering large data sets. This, of course, is in addition to the other issues discussed in Section 4.1 regarding hierarchical methods. Huang (1998) notes that while the K-means clustering method is efficient for processing large data sets, the K-means algorithm only works on continuous data because it minimises the distance function by changing the means of clusters. This prohibits it from being used in applications where categorical data are involved.
Huang (1998) proposes the K-modes approach to deal with categorical data. However the drawback of this approach is it does not allow the combination of numeric and categorical data into a single clustering technique, where the numeric data is clustered using the K-harmonic means approach. Therefore as a middle ground we follow Ralambondrainy (1995).
Ralambondrainy (1995) presented an approach to using the K-means algorithm to cluster categorical data. Ralambondrainy's approach is to convert multiple category attributes into binary attributes (using 0 and 1 to represent whether the household displays that attribute or not) and to treat the binary attributes as numeric in the K-means algorithm. We slightly modify Ralambondrainy (1995) approach for the K-harmonic means algorithm. Huang (1998) states the drawback of this approach is that the cluster means for categorical variables, given by real values between 0 and 1, do not describe the characteristics of the clusters. However by taking a simple frequency ex post of the households in that cluster that display that attribute we are able describe the cluster's characteristics. For example by counting the number of households in the cluster who rent, then counting those who have a mortgage, and comparing them, we are able to describe whether the cluster is predominately renter or mortgage holder.
Appendix D Assessing the 'goodness of fit' of clusters#
Clustering aims to partition observations into homogeneous clusters based on a set number of attributes, while observations in different clusters are heterogeneous on those attributes. In this appendix we examine how different the clusters are from one another and also which clusters are relatively more or less homogenous.
Sharma (1996) proposes a measure of the heterogeneity between clusters, RS:
where SST = Total sum of squares and is the distance between all observations as measured by the distance function; SSW = Sum of squares between clusters as measured by the distance function.
The value of RS ranges from 0 to 1, with 0 indicating no difference between clusters and 1 the maximum possible.
For the 2006/07 HES Figure D.1 plots the RS against the number of clusters one could potentially form from the sample. For 12 clusters, the number we have chosen, the RS is 0.975, indicating the clusters are very different. The graph is useful to illustrate why we chose 12 clusters, as after 12 the gains from an additional cluster become very close to zero as we see that the value of RS start to asymptote.
- Figure D.1: Cluster heterogeneity (RS) and number of clusters
-
- Figure D.2: Goodness of fit of the clusters
-
As the K-harmonic means algorithm forces all the households into one of the 12 different clusters, ie every observation must go into one cluster or other, there are going to be some clusters that display more within cluster variation than others, ie some clusters that fit the data better as they contain less outliers. Figure 5.2 reports the proportion of total within cluster variation that is owing to a particular cluster. The clusters generally range from between 7% and 10% in terms of within cluster variation, meaning there are not any extreme outliers in terms of within cluster variation. It is interesting to note between the two periods, cluster K becomes significantly more heterogeneous; this is hardly surprising as this is the cluster which represents older people still working and thus as it grows (as we said in Section 5.2.3 it grew 30% in size between the two periods) as older labour force participation increases we would expect more diverse people in terms of other attributes will inhabit it.
Appendix E The linear expenditure system and equivalent variation#
E.1 Derivation of the equivalent variation metric#
The direct utility function for the Linear Expenditure System is:
With 0 , and
. βi is marginal expenditure on good i out of total expenditure; given βi is positive it rules out inferior goods. xi and γi are respectively the total expenditure on good i and the amount of committed expenditure on good i. Committed expenditure is expenditure that is considered the basic need and is consumed no matter what the income. If pi is the price of good i and γ is total expenditure the budget constraint is:
Following Creedy and Sleeman (2006) we define the two terms A and B respectively as:
The indirect utility function, V (p, y), can be derived as:
inverting (E.5) and substituting the expenditure function E(p, U) for y you get:
Given
When prices change from p0 to p1, we can write:
assuming that total expenditure remains constant at y, this gives:
Substituting for U1 into (E.9) using equation (E.6) and rearranging slightly gives:
The term A1/A0 is a Laspeyres type of price index, using committed expenditure of good i (γi) as weights:
Where
The term B1/B0 can be simplified to
which is a weighted geometric mean of the relative prices of each good in time 0 and time 1.
Given we have estimated budget shares wi (see Section E.2) and have expenditure levels for each household type as well as observed price changes,[39] to calculate the equivalent variation all we need is piγi and βi.
Creedy and Sleeman (2006) show that:
Where ei is the elasticity of the budget share of good i to expenditure. We are able to estimate this parameter, ei for each household type; the next section describes the estimation procedure. Once ei is estimated, and given we have values for wi, we are able to calculate βi, one of our unknowns.
Given wi, βi, ξ and ei we are able to calculate ηii - the price elasticity for each household type of good i to its own price using (E.15). ξ is the Frisch parameter and denotes the elasticity of marginal utility of total expenditure with respect to total expenditure. As Creedy and Sleeman (2006) do we impose the Frisch parameter. Following Creedy and Sleeman (2006) we assume it takes a fixed value of -1.9, in the next section we test the sensitivity of our results to different values of the Frisch parameter.
Given ηii and total expenditure γ we can find piγi using:
We can now calculate the equivalent variation using (E.10) with the expenditure γ.
E.2 Estimating the budget share - expenditure elasticity#
As we outlined above we need an estimate of the budget share of good i, wi, and the elasticity of the budget share of good i with respect to expenditure, ei. Again following Creedy and Sleeman (2006) we estimate the budget share of good i for each household type using the following functional form (omitting the i subscript for each good):
As Creedy and Sleeman (2006) note this form has the convenient property that if parameters are estimated using ordinary least squares, the adding-up condition that the budget shares must sum to 1 across all goods holds for predicted shares, at all total expenditure levels, γ.
As there are 36 commodity groups (see Table A.3) and 34 household types (12 cluster and 22 categories in the hard dimension analysis), a total of 1224 (36 times 34) budget share regressions were performed. Hence the estimated budget shares for each good and each group and cluster cannot be reported here.
Turning to estimating the elasticity of the budget share of good i with respect to expenditure, at any given level of γ, the expenditure elasticity (for a required commodity group and household type) can be expressed as:
Which we are able to estimate using ordinary least squared - again owing to the large number of results these are not reported.
E.2.1 Sensitivity of results to Frisch parameter
Following Creedy and Sleeman (2006) we set the Frisch parameter at -1.9. The Frisch parameter is the marginal utility of total expenditure with respect to total expenditure hence its plausible values are negative. As stated above it is used to calculate equivalent variation using the Linear Expenditure System. Figure E.1 and Figure E.2 show that while the values of our calculated equivalent variation measures differ for different various plausible values of the Frisch parameter. The relative positions of the clusters in terms of who was affected most and least remains the same until very high values (ie less negative) of the Frisch parameter are selected.
- Figure E.1: Movements in equivalent variation
-
- Figure E.2: Movements in equivalent variation normalised by expenditure
-
Notes
- [39] See Appendix A.