The Treasury

Global Navigation

Personal tools

Treasury
Publication

Survey Reweighting for Tax Microsimulation Modelling - WP 03/17

1  Introduction

Tax microsimulation models are based on large-scale cross-sectional survey data. Each individual or household has a sample weight provided by the statistical agency responsible for collecting the data. The typical starting point is to use weights that are inversely related to the probability of selecting the individual in a random sample, with some adjustment for non-response. It has become common for agencies, using ‘minimal’ adjustments, to produce revised weights to ensure that, for example, the estimated population age/gender distributions match population totals obtained from other sources, in particular census data. Such calibration methods appear to be well known among survey statisticians, a highly influential paper being that by Deville and Särndal (1992).[1]

Users of official data usually take the weights as given, when ‘grossing up’ from the sample in order to obtain estimates of population values. This applies not only to simple aggregates, such as income taxation, or the number of recipients of a particular social transfer, or the number of people in a particular age group, but the weights are also used in the estimation of measures of population inequality or poverty. However, there is no guarantee that weights calibrated on demographic variables produce appropriate revenue, expenditure and income distribution results.

One aim of this paper is therefore to describe the basic calibration approach to economic modellers who are not familiar with the survey literature but need to reweight their samples. This may arise, for example, if population aggregates, not used for official calibrations, are not sufficiently close to population values obtained from other data sources, such as tax and benefit administration data. A further important reason for wanting to reweight the data arises when a survey from one year is used to examine the likely implications of, say, a tax and transfer policy in a later year. This need can arise if cross-sectional surveys are not carried out every year or if there are long delays in releasing data. Nevertheless, other administrative data may be available at more frequent intervals. It is also useful to be able to allow for changes in, say, the age distribution of the population or in aggregate unemployment rates over time.

The basic problem of obtaining ‘minimum distance’ weights is described more formally in section 2. The chi-squared distance function has an explicit solution and this is derived in section 3.[2] A more general class of distance measures is discussed in section 4, where iterative solutions are needed. These sections provide a simplified exposition, with derivations, of some of the results stated by Deville and Särndal (1992), whose more sophisticated and comprehensive treatment concentrated on statistical inference issues.[3] The use of Newton’s method for the solution of the nonlinear equations is explored. Numerical examples are used to compare alternative distance functions, based on a small hypothetical sample. Finally, in section 5 the methods are applied to New Zealand Household Economic Survey (HES) data. Brief conclusions are in section 6.

Notes

  • [1]A detailed description of calibration and Generalised Regression (GREG) methods used in Belgium is given in Vanderhoeft (2001), which also describes the SPSS based program g-CALIB-S. Bell (2000) describes methods used in the Australian Bureau of Statistics household surveys, involving the SAS software GREGWT. Statistics Sweden uses the SAS software CLAN, described by Andersson and Nordberg (1998) and also used by the Finnish Labour Force Survey. All results in the present paper were obtained using Fortran programs written by the author.
  • [2]The link between this method and Generalised Regression estimators of population totals is discussed briefly at the end of the section. See especially Särndal et al. (1992).
  • [3]Deville and Särndal (1992) used fewer than two pages to state the results discussed here.
Page top