The Treasury

Global Navigation

Personal tools

3.2  Estimating the determinants of hourly earnings

As mentioned earlier, there is a natural sequence in estimating the determinants of work experience prior to hourly earnings. Accumulated work experience is a ‘pre-determined’ variable when hourly earnings are observed among workers at the interview at age 21. Thus, systematic differences in work histories between non-Maori and Maori can influence earnings capacity indirectly through experience. We want to know whether Maori face lower wages, on average, compared to non-Maori with the same experience, qualifications and other relevant factors.

Consider the following regression specification for hourly earnings:

(5)    lnWi = αEXPi + πMAORIi + Z′iζ + εi

where the dependent variable is the natural logarithm of hourly earnings. The log wage is assumed to be linear function of actual work experience and other productivity characteristics contained in the vector Zi. The coefficient π indicates whether Maori workers, on average, receive lower wages than non-Maori with the same observable characteristics.

The hourly earnings regression can be estimated only for the subsample of those who were working and reporting a wage rate at the time of the survey. We might want to ask a different question from this analysis than the one posed above. Do Maori, on average, face lower wages than non-Maori with the same experience, qualifications and other relevant factors? To answer this question we would need to consider the issue of sample selection bias. Workers at a point in time may be a non-random sample of individuals. The problem is that unobserved factors that influence the work outcome may be correlated with unobserved factors that influence the wages that these individuals face in the labour market. This could result in biased estimates of the coefficients in equation (5) for all individuals.

The procedures that could be used to correct for possible sample selection bias have been extensively examined in the literature.[5] Suppose an individual will be observed working at age 21 if a latent variable Y*i is positive. Furthermore, suppose that this variable is a linear function of a set of covariates that include previous work experience, ethnicity and many of family background variables used in the regression models in the previous section (Xi).

(6)     Y*i = ηEXPi + τMAORIi + X′iψ + ωi

The problem is that the disturbance terms εi and wi may be correlated. Although the dependent variable in equation (6) is unobserved, its binary counterpart Yi is observed and this indicator variable simply depends on the sign of Y*i.

(7)     Yi = 1 iff Y*i > 0 (the individual is working)

Yi = 0 iff Y*i≤ 0 (the individual is not working)

Hourly earnings are observed only among workers at the time of the survey. This conditional expectation can be incorporated in previous wage equation by writing:

(8)     E(lnWi | Yi = 1) = αEXPi + πMAORIi + Z¢iζ + E(εi | Yi = 1)

= aEXPi + πMAORIi + Z′iζ + E(εi | ωI > ηEXPi + τMAORIi + X′iψ)

= aEXPi + πMAORIi + Z′iζ + θ[Φ(ηEXPi+τMAORIi+X′iψ)/(1–Φ(ηEXPi+τMAORIi+X′iψ))]

= aEXPi + πMAORIi + Z′iζ + θλi

where λi is the mean of a truncated normal distribution or inverse Mills ratio. This sample selection variable can be computed from the maximum likelihood estimation of a probit model on this dichotomous work outcome.

Although not strictly necessary, identification of this sample selection term largely depends on a set of variables that are included in the work regression, but directly excluded from the wage regression. One could argue that the CHDS offers a number of variables on family background that might serve as ‘instrumental variables’ for λi. These are factors that influence the propensity to be employed, but are not directly observable by potential employers and should therefore not influence market wages. Such background characteristics are generally unavailable to researchers with cross sectional data.[6] Yet, there is little reason a priori to believe that any of these background factors should necessarily be excluded from the wage regression.

As a result, we choose a different approach in this study. The vector Zi is allowed to alternatively exclude, and then include, the same personal and family background factors used in the earlier work experience regression. These short and long regression specifications will indicate whether or not this information has any impact on the coefficients attached to ethnicity. In other words, we use detailed information on personal and family background characteristics in the CHDS to mitigate the possibility of omitted-variable bias in estimating the impact of ethnicity on wage rates.

To see whether or not the wages facing all Maori are systematically lower than those facing all non-Maori, we use these estimated regressions on workers to predict the wages that face all youth. It’s possible that non-working Maori have lower productivity characteristics (eg, less work experience and fewer qualifications) than non-working non-Maori. Even if Maori workers aren’t paid less than non-Maori workers, all Maori may face lower wages than all non-Maori. This technique essentially allows for the sample selection process to place on observable (but not unobservable) characteristics.

Notes

  • [5]See Heckman (1979) and Maddala (1983) for background discussion on sample selection bias.
  • [6]See the discussion in Section 2 on the difficulties of correcting for sample selection bias in Alexander, Gene and Jaforullah (2001).
Page top