The Treasury

Global Navigation

Personal tools

2.2  The Use of Synthetic Cohorts

To study the lifecycle profiles of saving we would ideally have panel data, where the same people are tracked over time. However, the available panel surveys in New Zealand are restricted to cohorts of young people who were born in the 1970s, and so are unsuitable for studying lifecycle phenomena.[7] But the availability of a time-series of cross-sectional Household Economic Surveys allows us to construct synthetic panels following methods described by Shorrocks (1975) and Deaton (1985).

The key idea with synthetic panels is to divide the sample into groups whose membership is assumed to be fixed over time. The average behaviour of these groups is then tracked over time and as long as the sample is continually representative of the population that has fixed composition, estimates from these synthetic panel data should be consistent with estimates from genuine panel data on individuals.[8]

In the context of saving behaviour, the synthetic panel method requires that we form various cohorts defined by date of birth and then follow them across the successive Household Economic Surveys. Provided the population is not much affected by migration, and provided that a particular cohort is not so old that its members are dying in significant numbers, each successive survey lets us track movements in the average behaviour of each cohort over time (Deaton, 1997). For example, we can potentially look at the average saving rate of people who are 30 years old in the 1985 survey and connect that to the average saving rate of those who are 31 years old in the 1986 survey because both averages refer to the cohort born in 1955. Not only may these averages have many of the properties of panel data, they may also avoid some of the problems.

In particular cohort data are constructed from fresh samples each year, so problems of sample attrition should be less severe, and there may be less bias due to measurement error because we are typically working with a cohort average (or some other quantile), which should reduce the impact of idiosyncratic variability that is a feature of data on individuals.

However, there are at least three practical problems with the use of synthetic panels for studying saving behaviour in the Household Economic Survey. The first is that we do not have data on individual consumption (and hence saving) so we can only follow households, whose cohort is defined by the date of birth of the household head. Hence, we face problems of household dissolution and reformation, where, for example, older people go to live with their children, so that previously “old” households become “young” households in subsequent years. There is no practical way to deal with this problem, given the nature of the data at hand, but we do attempt some sensitivity analyses based on “individual” measures of saving.

The second problem is that the assumption that the membership of the group is fixed may sometimes be hard to maintain. For example, if mortality and wealth are negatively related, cohort averages will reflect the fact that the population from which the samples are drawn becomes progressively richer as the poorer individuals die younger (Attanasio and Banks, 1998). This second problem is related to the first, because rather than dying, the poorer elderly also may be absorbed into younger households. We attempt to deal with this problem by restricting the maximum age in our sample to 74 years, although the possibility that wealth-related mortality has begun earlier than this age cannot be discounted.

The third problem is that the overall sample size of the HES (approximately 3500 households per year) means that many of the cell averages would represent rather small samples if they are formed from the interaction of each birth-year with each survey year. These small cell sizes may impair the precision of any estimates formed using the synthetic panel techniques. We respond to this sample size problem by using five-year birth intervals.

Table 2 contains details of the five-year birth-interval cohorts, including the birth years, the ages observed and the average cell size. Some of the earliest and latest born cohorts are tracked across fewer of the survey years because otherwise the age of these household heads would fall outside the range 19-74 years during the 1984-98 period.[9] The table also contains estimates of the average saving-to-consumption ratio for each cohort, the same ratios calculated at the median, 25th and 75th percentiles and the ratio of average savings to average is clear that this pattern combines both age and birth cohort effects because the ages over which household heads are observed also vary when moving from one cohort to another.

Table 2 – Cohort definitions, cell sizes and saving rates
Year of Birth Ages Observed Average Cell Size Total Sample Savings Rate (S/X)
Mean 25th Percentile Median 75th Percentile
1910-14 70-74 116 581 0.497 0.002 0.317 0.802 0.311
1915-19 65-74 174 1,743 0.431 -0.037 0.241 0.674 0.263
1920-24 60-74 201 3,009 0.426 -0.069 0.235 0.657 0.267
1925-29 55-73 235 3,518 0.396 -0.083 0.216 0.647 0.253
1930-34 50-68 224 3,361 0.405 -0.076 0.242 0.631 0.266
1935-39 45-63 222 3,324 0.449 -0.051 0.265 0.711 0.277
1940-44 40-58 270 4,047 0.474 -0.053 0.251 0.701 0.304
1945-49 35-53 330 4,955 0.383 -0.070 0.221 0.600 0.257
1950-54 30-48 375 5,624 0.322 -0.112 0.187 0.559 0.196
1955-59 25-43 382 5,726 0.277 -0.103 0.162 0.507 0.164
1960-64 20-38 362 5,437 0.261 -0.103 0.158 0.470 0.164
1965-69 19-33 206 3,087 0.254 -0.097 0.167 0.495 0.173
1970-74 19-28 155 1,553 0.260 -0.099 0.168 0.521 0.194
1975-79 19-23 61 304 0.156 -0.236 0.030 0.392 0.033
All cohorts 19-74 257 46269 0.353 -0.086 0.202 0.584 0.222

One way to hold age constant so that any cohort effect can be observed is to focus on the ages where adjacent cohorts overlap. To do this, each cohort’s ‘age’ is based on the median year of birth within the five-year birth interval.[10] Because the cohorts are defined by a five-year interval and we have 15 years of data, each cohort potentially overlaps at ten ages with the next one.[11]

Table 3 contains estimates of the mean and median saving rate for each pair of adjacent cohorts, averaged over the ages in which the two cohorts overlap. For both the mean and the median, the first four rows of the table, corresponding to households whose heads are born between 1910 and 1934, show a negative cohort effect. Each later born cohort has a lower average saving rate than the earlier born cohort had at the same age. This pattern is reversed when moving from Cohort 5 (household heads born in 1930-34) through to Cohort 11 (born in 1960-64) as each later born cohort has a higher average saving rate than did the earlier born cohort at the same age. This preliminary view of the raw data suggests that there may well be an important cohort pattern on saving among New Zealand households. However, more formal methods are needed to see if these cohort effects persist and are statistically significant once a greater lifecycle age structure is imposed on the data, and allowance is made for other conditioning variables.[12]

Table 3 – Mean and median saving rates, averages over overlapping ages
Cohorts Ages of Overlap Average of Means Average of Medians
1, 2 72-76 0.463, 0.412 0.293, 0.249
2, 3 67-76 0.428, 0.396 0.263, 0.222
3, 4 62-71 0.440, 0.340 0.238, 0.183
4, 5 57-66 0.428, 0.385 0.248, 0.204
5, 6 52-61 0.444, 0.502 0.261, 0.291
6, 7 47-56 0.428, 0.550 0.276, 0.288
7, 8 42-51 0.426, 0.440 0.238, 0.264
8, 9 37-46 0.324, 0.368 0.197, 0.231
9, 10 32-41 0.277, 0.301 0.164, 0.177
10, 11 27-36 0.250, 0.264 0.154, 0.153
11, 12 22-31 0.270, 0.266 0.176, 0.180
12, 13 17-26 0.189, 0.281 0.125, 0.171
13, 14 17-21 0.291, 0.127 0.162, 0.035

In addition to comparing the average saving rates across cohorts, we can also track the saving rate for each cohort across successive survey years. To do this, each cohort’s ‘age’ is once again based on the median year of birth within the five-year birth interval. Figure 1 plots these saving rates against age for each cohort, with the mean saving rate in the top panel and the median saving rate in the bottom panel. To give an example of how these cohorts are tracked, for Cohort 6 who had a median age of 47 in 1984, the 1984 survey was used to calculate the average saving rate for all households whose head was born in 1935-39 and the result is plotted as the first point on the line marked “6” (with a median saving rate of 0.17). The rest of the line comes from the other surveys, tracking those households whose head was born in 1935-39 until they are last observed at (median) age 61 in the 1998 survey (with a median saving rate of 0.33).

The immediate impression from both the mean and median saving rates in Figure 1 is the substantial amount of noise in the estimated average saving rates. Because each point is a summary statistic for cells that themselves hold an average of 250 households, the great variability in saving behaviour across households is apparent. But even with the noise, there is a “hump” shape in these graphs, with average saving rates being highest from the mid-40’s until household heads reach their 60’s. Any decline in saving rates after the peak saving years is more apparent at the median than the mean. The cohort effects can also be seen from the variation in saving rates for different cohorts at the same age (i.e., by taking a vertical section anywhere through Figure 1).

Notes

  • [7]The “snapshot” offered by a single cross-section is also unsuitable for observing life-cycle patterns because although a variety of ages are observed in a cross-section, they also represent different birth cohorts. If there are strong cohort effects, a cross-section age profile may be very different from the age profile of any individual, as noted by Shorrocks (1975).
  • [8]Verbecek and Nijman (1992) note that treating averages of cohorts as if they were from genuine panel data may result in inconsistent estimates if the unobservable individual fixed effects are correlated with the explanatory variables. However provided that the true means in each cohort exhibit sufficient time variation and the cohort sizes are sufficiently large (they suggest 100 to 200) then the bias arising from ignoring this errors-in-variables problem is likely to be quite small.
  • [9]Specifically, Cohorts 1-3 and 12-14 with birth years 1910-24 and 1965-79.
  • [10]For example, for Cohort 1, where household heads are born between 1910-14, we treat the year of birth as 1912.
  • [11]For example, Cohort 6 (born in 1935-39) is observed between (median) ages 47 and 61, while Cohort 7 (born in 1940-44) is observed between (median) ages 42 and 56, giving overlapping ages between 47 and 56. However, the earliest and latest born cohorts are observed for fewer years and hence have fewer years of overlap.
  • [12]We undertake such an analysis in Section 4.
Page top