The Treasury

Global Navigation

Personal tools

Treasury
Publication

New Zealand Households and the 2008/09 Recession

2 Measurement issues

2.1 Defining household types

In terms of defining household type, we take two approaches. The first is to follow the traditional way in the literature of splitting households into types on one dimension: age groups, income quartiles etc. We term these dimensions "hard dimensions". The term is designed to invoke the notion that the researcher sets an a priori boundary when grouping the data under this approach. For example in splitting the sample by age groups: 26-35, 36-45 etc, the researcher imposes an implicit assumption that there may be a difference between a 35 year old and a 36 year old but none between a 34 year old and a 35 year old.

The different dimensions in the data cannot be thought of as being statistically independent of each other. For example income and age are correlated, people who own their home rather than rent are likely to have higher incomes, and more qualified people generally have higher incomes (see Table 2.1). These relationships make identification of causality difficult. For example, if we find that the lowest income quartile had the lowest income growth - is this because they are more likely to be younger, or less qualified? Or is it not related to either of these? One possible solution is cross-tabulation, by splitting the sample on one dimension (income), then another (age) and then the other (qualification). However this presents a difficulty in small samples to ensure the statistical robustness of the results (and in some instances comply with minimum sample confidentially requirements). Given the 2006/07 HES sample has around 2,500 households, this will be a problem in our case.

Table 2.1: Mean disposable income by home ownership status, age group and qualification
  Disposable income
(June 2007 year, $)
Home ownership status  
Renters 45,751
Mortgage holders 54,358
Other 75,229
Qualification
School or none 45,910
Bachelor degree 70,482
Post-graduate 76,599

Our solution to both the aforementioned issues is "clustering". Clustering aims to group observations in the same cluster that are more similar to each other than they are to observations in different clusters on a number of dimensions. Put another way the goal of clustering is to partition observations into homogeneous clusters based on a number of attributes, while observations in different clusters are heterogeneous on those attributes. Thus one advantage of clustering is smaller sample sizes can be split on more dimensions than cross-tabulation to help deal with identification issues, while maintaining confidentiality and statistically significant sample sizes. Second, as opposed to splitting the data on hard dimensions, the data determine the boundaries of a cluster. Using our example from above of splitting the sample by age groups, clustering lets "the data decide" where the boundary lies rather than imposing it between 35 and 36. Finally, clustering captures natural correlations in the data (age and income) which allows us to provide intuitive descriptions of the characteristics of a cluster (eg young high income renters can be termed young professionals).

Page top