Treasury
Publication

4  Models

The following analysis uses four main models - two linear regression models and two logistic models. Results from the linear regression models can be found in Section 5: Results from core models. Results from the logistic models can be found in Section 6: Results from logistic models.

4.1  The core models

In order to control for variation in wealth not directly associated with variation in health, a series of control variables were used. These were grouped in two linear regression models, referred to as core models one and two. The core models were constructed to be used as a basis from which the measures of health could be considered.

Core model one includes age (and its square root), income, geographic region, ethnicity, highest qualification achieved, housing tenure, deprivation, gender and the composition of the household. These variables were chosen because of their relative independence to each other.

Core model two builds on core model one and also includes variables for whether the respondent was born in New Zealand, the number of years of paid employment, smoking habits, benefit receipt and being a student. The core models have the following form:

For core model one: In(NW) = ƒ(H,Z1)

For core model two: In(NW) = ƒ(H,Z1,Z2)

Where:

NW = Net wealth scores

H = The particular health variable being considered

Z1 = The control variables included in core model one

Z2 = The new control variables introduced in core model two

All control variables were taken from Wave 2. The variables included in core model two but not in core model one were initially excluded owing to the probable lack of independence between themselves and other control variables. For example, whether the respondent was born in New Zealand was expected to be correlated with ethnicity. Years of paid employment and whether or not the respondent was a student were expected to be correlated with age.

Where control variables could be expressed in different forms, the significance and clarity of the variable were used to determine which form was included in the models. For completeness, the variables specified above were included in the models even if they were not found to be significant at the 10% level. Full regression tables of the core models can be found in Appendix A.

After modelling net wealth as the dependent variable, total assets and total liabilities were modelled with the same set of control variables. The aim was to identify whether changes in wealth were driven more by changes in assets or changes in liabilities.

Statistics New Zealand requires that all output is censored. All output has been weighted and counts rounded to the nearest hundred. Weighted counts of less than 1,000 are not released. Percentages are calculated after censorship. Both core models were run estimating robust standard errors. Data access was restricted to the Statistics New Zealand Datalab, where analysis was conducted using Stata Version 9.

4.1.1  Interpreting the results

The variables used in this study, including the health variables, are correlated with one another. Isolating these variables gives a simplistic representation of the factors that contribute to net wealth. In particular, we ignore the range of possible interactions between the health variables and the control variables, and interactions of the health variables with each other.

Many of the results from comparing health and wealth are given in terms of marginal effects. In the following discussion the marginal effects are talked about as “effects” on net wealth. It is important to keep in mind that these models do not prove or establish a causal link between the explanatory variables and wealth.

All marginal effects have been calculated at the mean value of the regressors. This means they apply to a theoretical person of mean age, income and number of years of paid employment. For categorical variables this theoretical person matches the weighted sample proportions. For example, because 78.7% of the longitudinal population were born in New Zealand and 21.3% of the longitudinal population were born overseas, the theoretical person uses 78.7% of the coefficient for being born in New Zealand and 21.3% of the coefficient for being born overseas. The means and proportions of the control variables used to calculate the marginal effects were computed from the entire longitudinal population and can be found in Appendix A, Appendix Table 3.

The regression methods used are known as mean regressions. The value of the dependent variable estimated by the regression model when each variable is set at its regression-sample-mean value will be its sample mean. Natural logarithms were used to transform net wealth before analysis. This greatly reduced the effects of outliers on the mean, though the median remained unchanged. As the model estimates the log of net wealth, the model's mean estimate will be the mean of the log of net wealth which, when transformed back, will be below the mean value of net wealth.

There are two age terms in the model: age and the square root of age.[10] For interpretation and application of these models the square root of mean age has been used in the calculation of the marginal effects. The estimates therefore apply to someone of mean age.[11]

4.1.2  Predicted values

Comparison of the observed logarithms of net wealth and those estimated by the models shows the model tends to provide reasonably good estimates for the population. Table 4 gives the percentiles of the logarithm of net wealth and the percentiles of the predicted logarithm of net wealth from core model two.

Table 4 – Comparison of actual values to those predicted by core model two
Percentiles 5% 10% 25% Median 75% 90% 95%
Actual net wealth 7.438 8.412 9.953 11.333 12.201 12.863 13.321
Predicted net wealth 8.034 8.678 10.055 11.311 11.983 12.427 12.669

Source: SoFIE Waves 1–3, OSMs, longitudinal weights, supplied by Statistics New Zealand

The predicted values, for the logarithm of net wealth, for longitudinal respondents in the 95th and the 5th wealth percentiles tended to differ more from their actual values than for any other percentile of the population.

Figure 2 graphs the percentiles of net wealth, without logarithms, from Table 4. It should be noted than when transforming from the predicted logarithm of net wealth to predicted net wealth what seem like minor differences between the logarithms become much more significant differences between the transformed values. This is owing to the shape of the exponential curve. It would be prudent to limit the application of these models to levels of net wealth below \$400,000.

Notes

• [10]Despite the use of age and age2 being “standard”, preliminary testing suggested that age and √age was preferable.
• [11]This means that the estimated level of the dependent variable used as the base to calculate marginal effects may slightly exceed its mean value as the square root of mean age is larger than the mean of the square root of age, and the estimated coefficient of the square root of age is positive.
Page top