The Treasury

Global Navigation

Personal tools

Treasury
Publication

Regression Estimates of the Elasticity of Taxable Income and the Choice of Instrument

Appendix B: The Inland Revenue Data

The data used in this paper are personal income information sourced from the New Zealand Inland Revenue Department's (IRD's) tax returns and employer PAYE records. The database is a stratified random sample, including 2 per cent of all wage and salary earners (which in turn includes people in receipt of taxable welfare benefits) and 10 per cent of all other individual taxpayers, such as the self-employed. The database omits individuals with no personal taxable income (unless they filed a tax return), and those whose only income was from investments with the correct amount of tax deducted at source and no requirement to file a tax return. The former group are not of interest for this study, and the latter are expected to be a fairly small group representing a very small proportion of total taxable income. The database does not include income not attributed to natural persons, for example income held in companies or trusts.

Randomness is ensured by sampling taxpayers based on the last two digits of their unique 'IRD number', which are issued broadly sequentially and not reflective of the characteristics of the specific individual. In order to ensure these are representative of the total individual taxpayer population, weights are applied to each observation in the sample according to the characteristics of the individual. For 1999, the database includes a total sample of 138,464 individual taxpayers, representing a total population of 2,800,528 taxpayers. For 2002 the sample size increases to 139,420, representing a taxpayer population of 2,962,200.

The database covers the years 1994 to 2009, and allows users to follow individuals across time by use of their IRD number. Because filing requirements have changed across time, the dataset contains a number of structural breaks. These include a break across the 1999-2002 period considered here, when the pre-populated personal tax summary (PTS) replaced the old IR5 tax return. This had a minor impact on some income tax data collected, particularly with regards to dividend and interest income below a small threshold. Aside from salary and wage income data, the database also includes data on business income, trust income, interest, dividends, rental income, shareholder-employee salary, partnership income and other income. Expenses and losses claimed (including those through LAQCs) are also recorded, as well as information on demographic characteristics such as date of birth and gender. These data are taken from a range of sources, largely tax returns submitted to the IRD.

For the regressions in this study, various restrictions are applied to the data. Firstly, in recognition that various unrelated behavioural changes may bias the results, those taxpayers who were younger than 25 in 1999, or older than 64 in 2002, are removed from the sample. This fairly common restriction removes those taxpayers likely to be in the very early stages of a career, as well as those likely to have retired at the age of 65 (the age of eligibility for New Zealand superannuation). Secondly, those with 1999 taxable income less than $16,000 or greater than $1,000,000 are excluded from the sample. The first of these restrictions is particularly important in order to remove a significant segment of the population who received some form of government benefit, as abatement rates mean that these individuals face different effective marginal tax rates to standard taxpayers. Finally, the sample is necessarily reduced to only those individuals who have sufficient data in all six relevant income years (ending 1998, 1999, 2002, 2003, 2004 and 2005). Some taxpayers either entered or exited the tax system over this time, which means that their income dynamics cannot be estimated. A number of smaller, less significant restrictions are also imposed, such as the removal of zero or negative taxable income values and data entry errors (such as negative ages). Combined, these restrictions reduce the sample size to 38,744, which, when weighted up to reflect the population, represents 803,920 individual taxpayers (29 per cent of the original 1999 weighted sample).

Page top