The Treasury

Global Navigation

Personal tools


Using Integrated Administrative Data to Identify Youth Who Are at Risk of Poor Outcomes as Adults

6.2 The risk factors most associated with those outcomes (15-year-olds) (page 1)

The regression modelling undertaken in Step 1 described above relates a large number of risk factors to each of the four outcome measures used. The statistical strength of this relationship can be assessed according to the order in which the factors were selected by the forward selection procedure in the regression modelling - this procedure progressively adds factors to the model according to the additional explanatory power that factor contributes to the model at each stage of selection.[20]

The factors listed in the A3 document present an example of those five factors that are considered to be most predictive of each outcome using age 15 as an example. These are the factors added earliest in the modelling procedure for the regression models of 15-year-olds (and therefore that explain the most variation in outcomes, conditional on other variables already added). These are compared across models for both females and males, with extra weighting afforded factors that are highly predictive for both females and males. In saying this, all factors listed were significant predictors for both males and females. This list should be considered broadly indicative of the factors that are most important in predicting poor future outcomes for 15-year-olds.

Factors are different for different outcomes, but some factors are highly predictive across multiple outcomes. Being notified to CYF as a child was highly predictive of poor outcomes across all four domains, while ethnicity was significant across three domains (all except for having no level 2 qualifications by age 23), as was being stood down from school (all except for being on a benefit for more than five years). Having a caregiver with benefit receipt and/or low qualifications, receiving special education services and spending a long time on a benefit as a child were all highly predictive across two outcomes areas.

These findings should be interpreted with some caution for a number of reasons. Most importantly, whilst the association between a factor and a future outcome means that that factor may be a useful predictor of future outcomes, it does not necessarily mean there is a causal relationship between the two.

Additionally, factors are identified as being highly predictive in the modelling if they add something on top of the factors already selected for the model. Where a number of factors are highly correlated with each other, only one may be selected for the model even though the relationships may be complex, and correlated factors may also be independently highly predictive and bear an important relationship to the outcome of interest.

Finally, we have a limited set of observed predictive factors we can use from administrative data. In many cases, these factors may merely be acting as a proxy for other, unobserved factors that we are unable to measure. As young people enter their adult years, more information becomes available that can be used to determine their risk of poor outcomes. In many cases, this is a direct early indicator of the outcome of interest (for example, long-term benefit receipt in the late teen years is a direct measure of early long-term benefit receipt - the 'economic opportunity' outcome measure.


  • [20]At each stage of the procedure, the process examines the score chi-squared statistic for each factor were it to be added individually to the existing model. The factor with the highest chi-squared score is added and the procedure repeated until there are no remaining factors with chi-squared scores that are statistically significant at the 5% level of significance.
Page top