The Treasury

Global Navigation

Personal tools


Comparing the Household Economic Survey to administrative records: An analysis of income and benefit receipt

3.1  Interpreting the differences between the two data sources

There are a number of reasons why the survey data might differ from the administrative data.

Conceptual differences: The income measured in HES and IRD is not immediately comparable as some categories of income in HES (such as overseas income) are not collected in IRD.[11] In order to compare the two sources, we developed a concordance between the two (detailed in the next section). It is possible to match HES income to IRD income in the following categories: wages and salaries, self-employment, pensions, benefits, paid parental leave, student allowances, sole-trader income (including rental income) and partnership income. Our measure of total income consists of the above matched categories, excluding benefit income. The benefit data are analysed separately in section 5.

Table 1 and Table 2 show that 82% of HES income and 96% of IRD income is included in our defined total comparable income.[12]

Linkage error: Some of the people in HES will have been linked to the wrong person in the IDI. Statistics New Zealand estimates about 1.4% of people linked in HES are linked to the wrong person.[13]

Errors in the survey data: These occur when people incorrectly report their income in HES. For example, they don't remember how much they earned and guess or they report income in the wrong category or they round income up or deliberately misreport benefit receipt or income (for example, when people feel stigmatised).

Errors in the administrative data: These could occur when the data are processed (including being put into the IDI) and could relate to the amount or the timing of these earnings.

Administrative data also do not include 'under the table' earnings that are not reported to IRD but may or may not be reported in HES.[14]

When we observe differences between the survey and administrative data, it is not possible to determine for certain which of the above factors is responsible. In general, the data set best suited to a particular research question will depend on the question. For example, the argument for preferring IRD income to HES income for tax and welfare modelling can be made independently of the comparative analysis, as IRD income, not survey income, is the basis for estimating tax liability and so IRD income is more likely than survey measured income to reflect the government's tax revenue.

Despite the difficulty of assigning a particular reason for individual discrepancies, it is still useful to analyse these differences in order to assess their extent and nature. For example, the more similar the two data sets, the less likely any analysis is to depend on the data set.


  • [11] Conceptual differences are not likely to be a problem for the benefit data.
  • [12] Of the 18% of HES income that is not compared, a third (6.2 percentage points) is income from benefits (which are compared elsewhere in this document), 3.6 percentage points is in the category of investment income, 2.6 percentage points is income earned overseas, 2.5 percentage points is income classified as other regular income and 3 percentage points is from irregular income - see Table 1. The 4% of income from the data.income_tax_yr_summary table that we decided not to include in total comparable income is almost exclusively comprised of benefit payments (3.3%) with the remainder is ACC payments (0.6% - see Table 2.).
  • [13] This figure comes from Statistics New Zealand documentation available in the IDI. For details on how links and the link quality are determined, see Statistics New Zealand (2013) and especially Statistics New Zealand (2014).
  • [14] Since tax is not paid on this illegal income, excluding it when modelling tax takings is appropriate.
Page top