The Treasury

Global Navigation

Personal tools

4.2  Logistical regression models

The logistical or logit regression is applied when the dependent variable takes only the values of zero or one (eg, we could assign one to those in good health and zero to those in poor health). The OLS model could potentially be applied in this case with the continuous dependent variable on the left hand side merely replaced by a binary variable. Such a linear probability model, while generating the correct sign and significance of the coefficients, is generally not appropriate. There are at least three reasons for this. First, the variance of the dependent variable is not independent of the values of the explanatory variables (the problem of heteroskedasticity). Second, the predicted probabilities could be negative or greater than 1, which is counter-intuitive. Finally these models assume the marginal effects are constant. For these reasons it is preferable to employ a logit model.

Logit models are built on the notion of probability. Suppose the probability denoted by p, of some event is 0.8 (eg, the probability that a respondent is in good health). The probability that they are not in good health is then simply 1–p = 1–0.8 = 0.2. The odds of the event are defined as the ratio of the probability of “success” to the probability of “failure”:

Equation 2. (2)

In the above example, this would be 0.8/(1–0.8) = 4. In other words, the odds of a person being in good health (relative to being in poor health) are four to one. Note that it is quite arbitrary which way round p and 1–p are assigned.

By working with the logarithm of the odds, we circumvent the problem of the restricted range for the probability. The transformation to logarithmic odds maps the underlying probability whose range is from zero to one, into a variable with range from negative infinity to positive infinity. This is referred to as a logit transformation.[15]

We now sketch the use of this transformation in the estimation of the coefficients associated with the independent or explanatory variables in the underlying model. Let p be the probability that a respondent is a member of the category of the dependent variable as assigned the code of 1. All the remainder who do not belong to this category are assigned zero. Then:

Equation 3. (3)
Equation 4. (4)

We can now show that Z is the log of the odds. Rearranging equation (3) to solve for Z yields:

Equation 5.(5)

Hence Z is the log odds or logit. Table 4-1 sets out the relation between probabilities, odds and log odds.

Table 4-1 The relation between probabilities, odds and log odds
Probability
(p)
Odds
[p/(1-p)]
Log Odds
Ln[p/(1-p)]
0.001 0.001 -6.907
0.1 0.111 -4.595
0.15 0.176 -1.735
0.2 0.250 -1.386
0.25 0.333 -1.099
0.3 0.429 -0.847
0.35 0.538 -0.619
0.4 0.667 -0.405
0.45 0.818 -0.201
0.5 1.000 0.000
0.55 1.222 0.201
0.6 1.500 0.405
0.65 1.857 0.619
0.7 2.333 0.847
0.75 3.000 1.099
0.8 4.000 1.386
0.85 5.667 1.735
0.9 8.999 2.197
0.999 999.013 6.907
0.9999 9997.341 9.210

We can now proceed to estimate equation (4), in this case using Maximum Likelihood Estimation (MLE), an iterative procedure which searches for that set of values for α and the set of β such that the probability of observing the dependent variables in the sample of data is maximised.

Notes

  • [15]An alternative approach, the probit transformation, is discussed below.
Page top