5.2 Empirical specification
We attempted to estimate the production function set out in the previous section, but found that intermediate inputs and capital stock were highly correlated with R&D expenditures (correlation coefficients of 0.97 and 0.95 respectively). Thus we constructed a multifactor productivity index as our dependent variable, calculated using a fisher index of GDP (gross output less intermediate inputs) divided by the weighted sum of capital and labour. Thus our final specification was:
(10)
where all variables are in logarithms, weather is our soil moisture deficit variable, extension is the number of extension workers, hk is our human capital index, W(B)RD is current and past domestic R&D expenditures, and W(B)RDf is current and past foreign R&D expenditures (here proxied by current and past patent numbers). These main variables are plotted in Figure 3 below.
We also ran this basic model including a dummy variable, equal to zero before 1984 and one from 1984 onwards. This provides a crude test of whether there is a structural break in our data. That is, has the changed institutional settings and economic environment induced by the reforms impacted on MFP in the agricultural sector?
There are many factors which will be omitted from the typical production function set out above, including the learning by doing mentioned in section 5.1, and improved managerial and organisational practices. These omitted factors not only affect productivity growth but also affect the incentives to invest in R&D. Comin (2004) states that some evidence in favour of the potential importance of this omitted variable bias comes from the fact that, after Jones and Williams (1998) included fixed effects in their regression, the effect of R&D on TFP growth almost disappeared. However, due to data limitations it is impossible to correct for this problem.
Another problem which has been discussed in the literature is that of double counting. This occurs because the expenditures on labour and physical capital used in R&D are counted both in the R&D expenditures as well as in the measures of labour and capital, and so should be removed from the measures of labour and capital used in production. Schankerman (1981) demonstrates that the failure to remove this double counting has a downward bias on the estimated R&D coefficients. Within the agricultural sector, this would be a problem only to the extent that research is carried out by farm owners and farm workers themselves. We believe this would have a minimal effect in New Zealand.
Another problem which arises in any economic time series analysis is that of non-stationary variables. Regressions involving non-stationary variables may result in spurious results. Szeto (2001) notes that there are three solutions to the problem of spurious regression. The first approach is to take first differences of the data before estimating. The second approach is to add the lagged value of the dependent variable. Finally one may consider the cointegration approach.[18] We employ both of the latter two approaches in this paper (discussed below).
To employ the cointegration approach, one must first establish whether the variables in the regression are I(1). We tested all of our series for unit roots using the Augmented Dickey-Fuller unit root test. All series appear to be non-stationary, I(1) processes, except for the human capital index, the public R&D stock, the number of soil moisture deficit days, and MFP, the first two found to be stationary with drift and the latter two to be stationary with drift and trend (see Appendix 1 for unit root tests). However, the unit root test of MFP is very sensitive to the lag length chosen – for all lags greater than zero the Augmented Dickey-Fuller test could not reject the null of a unit root in the series. Also, the Phillips-Perron unit root test accepts the null of a unit root for the MFP series. Therefore we can be fairly sure that we are regressing an I(1) variable on (mostly) non-stationary explanatory variables and hence there could be a cointegrating relationship (which is determined by testing the residuals for stationarity).
As discussed in the previous section, the fact that increments to the stock of knowledge may not be utilised the moment they become available means that the relationship between R&D expenditures and output will not be contemporaneous. Thus it is important to capture this in the estimation procedure.
Alston, Craig and Pardey (1998) highlight the importance of lag lengths in estimating the returns to research. Many studies have used relatively short lag lengths. These may well capture the link between investment in research and increments to the stock of knowledge. However, production depends on the flow of services of the entire stock of knowledge rather than recent additions to it. They find that using a model that allows for the impact of research on productivity to last much longer than conventional approaches, the real marginal rate of return to research in the USA was found to be much lower than studies with inappropriate lag lengths.
However, one cannot simply include many lagged values of the R&D expenditures as this runs into problems of multi-collinearity. In order to overcome this problem, it is necessary to impose some structure on the nature of the lags. We have adopted three different approaches to this problem. In the first case we form estimates of the stock of knowledge (or R&D capital) using the Perpetual Inventory Method (PIM). In the second case we use a Koyck transformation and in the final case we impose a polynomial lag structure.[19] Each approach is discussed in turn in the following sections.
- [18]Non-stationary variables may be used in a levels regression if they prove to be cointegrated.
- [19]The Perpetual Inventory Method is a model whereby past flows are accumulated into a stock using weights. All three of our approaches can therefore be classified as using the PIM. The difference between what we label the “PIM” models and our Almon models are the weights used in the accumulation: the “PIM” models use geometric weighting. The difference between our “PIM” models and our Koyck Transformation models is the estimation procedure: our “PIM” models assume a depreciation rate and enter the accumulated stock directly into the production function, whereas the Koyck models estimate the weights from the regression model once the transformation has been applied. Thus for simplicity we have labelled the geometrically weighted PIM, with the depreciation rate assumed, as our “PIM” models.
