2 The Statistical Model
The estimation of wage equations involves a system of two correlated equations, the first of which determines selection (employment) using a probit equation, while the second determines wage rates, conditional on employment. The correlation between the two equations accounts for the possible selection into work of those with higher wage rates. The wages of workers may therefore not represent the wages of non-workers.
First the selection equation, where each individual's observed employment outcome is regarded as being the result of an unobservable index of tendency to participate in the labour force and employability,
(based on the probability of someone’s market wage being more than their reservation wage), which varies with observed personal characteristics, zi. The variables included in z may include both supply and demand side variables. Hence:
(1)
where ui is assumed to be independently distributed as N(0, 1)[2]. The realisation of
determines whether the individual is employed (Ei = 1), or unemployed or out of the labour force (Ei = 0), such that:
(2)
where
is the standard normal distribution function evaluated at
. The associated normal density function is denoted
. The parameters of (2) can be consistently estimated by a standard probit model; see Maddala (1983).
Let wi denote the logarithm of the wage rate and xi a vector of characteristics of individual i. The regression model is written as:
(3)
The ui from equation (1) and εi are assumed to be jointly normally distributed as N(0, 0, 1,
, r)[3]. In the first approach, equations (1) and (3) are estimated simultaneously, where
εi,ui ~ N(0,Σ), with
.
An alternative, frequently used, approach is to include an additional term in the wage equation indicating the tendency to participate, which can also correct for this selection process without the need to estimate the wage and selection equation jointly (Heckman, 1979). This approach consists of two steps. In the first step, equation (2) is estimated, after which an estimate,
, of the inverse Mill’s ratio for a working individual i is obtained using:
(4)
Then in the second step, in order to avoid selectivity bias, a correction term is added to (3):
(5)
Equation (5) takes into account the correlation between ui and εi. It can be seen that the variance of υi,
, is heteroscedastic, since:
(6)
where:
(7) ![]()
Efficient estimation of this model is carried out using the convenient two-step procedure of first estimating the probit model for the employment probability and calculating the predicted value for the inverse Mill’s ratio, and then using the predicted Mill’s ratio in the wage equation. Greene (1981) shows how to calculate the corrected standard errors.
We prefer to use the joint model as it makes the most efficient use of the available data. However, since the two-step approach has been used in many other studies, this paper presents both approaches to allow a comparison between the two sets of results to be made.
