2.2 Dealing with missing data
Merging demographic and employment data with GST sales and purchases data highlighted several issues. First, there were enterprises that had GST sales information but no employment data for the entire period they existed, or conversely had employment data but no GST sales information. Because it was not possible to form a measure of firm labour productivity when either employment or GST sales data were missing for the entire period the firm existed, these firms were dropped.[8] Second, some enterprises had partial information on employment or GST sales for part of the period the firm was recorded as existing. When this occurred during the middle of a firm’s existence the missing data observations were filled if at all possible using historical imputation. For example, a firm in existence between 1994 and 2003 with GST sales for the corresponding period but missing employment data in 1996 and 1997 would have the missing 1996 and 1997 employment data filled using employment in 1995. If historical imputation was not possible we then impute using data from subsequent years.
A partial explanation for situations similar to the example above is that some firms fail to respond to the Annual Business Frame Update (ABFU) questionnaire, despite the firm still operating (as indicated by the firm filing GST sales of $30,000 or greater). The non-response rate for the ABFU is estimated to be between 10% and 15%. The approach to imputing missing values is one that has been adopted by Statistics New Zealand in other contexts.
Cases occurred where a firm was in existence in the BD but there was no recorded information on total employment and GST sales or purchases at the beginning or end of the firm’s life. In these situations the firm was deemed to be an entrant in the first period that either employment or GST was available and was ‘ceased’ in the period following the last observation for either employment or GST sales.[9]
Table 1 shows the numbers of firms deemed to be in operation for each year between 1994 and 2003 and the percent of missing observations on sales, purchases and total hours in each year that are subsequently filled as described above. Missing observations on total hours are more prevalent than those on sales or purchases. Across all years 22% of observations on total hours are missing[10] as compared to 11% for each of sales and purchases. The portion of firm-year observations with missing information on any one of sales, purchases or total hours is 29%. Fewer observations are missing in 1994 and 2003 than in other years, particularly on sales and purchases. This is likely to be due to the fact that potential missing observations for firms existing in previous (in the case of 1994) and later years (in the case of 2003) cannot be observed and therefore cannot be taken into account in the process used for imputing missing data.
| Year | Firm count | Sales % | Purchases % | Hours % | Any data missing % |
|---|---|---|---|---|---|
| 1994 | 183,769 | 3 | 4 | 19 | 22 |
| 1995 | 213,846 | 9 | 9 | 20 | 27 |
| 1996 | 230,213 | 11 | 11 | 21 | 28 |
| 1997 | 239,453 | 12 | 12 | 25 | 31 |
| 1998 | 249,246 | 12 | 12 | 23 | 29 |
| 1999 | 257,103 | 12 | 12 | 25 | 31 |
| 2000 | 265,058 | 15 | 15 | 23 | 31 |
| 2001 | 265,867 | 15 | 15 | 23 | 33 |
| 2002 | 268,558 | 10 | 11 | 20 | 28 |
| 2003 | 256,430 | 4 | 5 | 19 | 23 |
| All years | 2,429,543 | 11 | 11 | 22 | 29 |
Table 2 shows the totals of all firms’ sales, purchases, value-added and hours worked after filling in every year. Total value-added ranges from between 80.9 and 96.2 percent of GDP over the period. Total sales and total purchases in each year are on average approximately 3.5 and 2.5 times total value-added respectively. Total hours worked per year averages 2,730 million for the entire period (or around 1.4 million full time equivalent employees[11]) which represents on average around 90 percent of economy-wide total hours worked (ranging from about 83.3 percent in 1994 to 94.5 percent in 1999).
| Year | Total sales ($million) | Total purchases ($million) | Total value-added ($million) | Percentage of economy-wide GDP | Total hours worked (millions) | Percentage of economy-wide hours worked |
|---|---|---|---|---|---|---|
| 1994 | 248960 | 177793 | 71038 | 81.3 | 2271 | 83.3 |
| 1995 | 269904 | 195003 | 74778 | 81.6 | 2492 | 87.2 |
| 1996 | 284381 | 207051 | 76915 | 80.9 | 2627 | 89.0 |
| 1997 | 296996 | 213584 | 82596 | 84.3 | 2702 | 91.4 |
| 1998 | 301134 | 215433 | 84798 | 86.7 | 2747 | 93.3 |
| 1999 | 310654 | 222401 | 86964 | 86.4 | 2808 | 94.5 |
| 2000 | 331368 | 235233 | 94621 | 89.8 | 2855 | 93.9 |
| 2001 | 347086 | 241402 | 103592 | 96.2 | 2870 | 92.7 |
| 2002 | 340712 | 232790 | 105771 | 93.9 | 2957 | 93.1 |
| 2003 | 344502 | 232513 | 108684 | 92.9 | 2971 | 91.9 |
Notes
- [8]Approximately 344,000 firms were first removed from the dataset due to missing employment information in all years they were observed. A further 77,000 firms were then removed due to missing GST sales in all years they were observed.
- [9]This situation may occur because i) SNZ are unable to determine whether non-response to the ABFU is genuine non-response or because the enterprise has ceased operating; or ii) the enterprise continues to file GST returns as it sells off assets even through it has ceased trading.
- [10]This is in addition to any historical imputation that Statistics New Zealand may have undertaken.
- [11]One full time equivalent employee represents 1920 hours of labour (i.e. 40 hours per week for 48 weeks per year).
