Using Integrated Administrative Data to Identify Youth Who Are at Risk of Poor Outcomes as Adults

4.2 Predicting risk and defining populations with high risk

The regression modelling allowed us to construct an equation for each individual that could be used to allocate them a risk score for each outcome of interest based on their age and gender as well as a wide range of other characteristics. Individual data is anonymised, and as such, it is difficult to use an individual risk score to target services. For this reason, the main purpose of this study is to identify target populations with high risk of poor outcomes across our outcome domains based on a small set of identifiable characteristics. The first stage in getting to these target populations was to construct a measure or measures of broader risk. We constructed two measures that were used in the remaining analysis:

  • Risk across multiple poor outcomes – at each year of age, the population was ordered according to their estimated risk score for each outcome and assigned a rank. These ranks were then averaged, and the population was ordered according to this average rank. Following a fairly arbitrary delineation, the 5% of the population with the highest average ranks were defined as being at extreme risk, while the 10% with the next highest average ranks were defined as being at high risk.[18]
  • Extreme risk of at least one poor outcome – the ranks constructed in the previous process were used to identify the 5% of the population at greatest risk of a poor outcome on each outcome measure, ie, at extreme risk of that outcome. A person was considered to be at risk where they were at extreme risk for at least one of the four outcomes.

The process of calculating risk scores and ranks and identifying general risk measures was repeated for both the 1990/91 birth cohort population and the December 2013 population. The focus of the descriptive analysis in the remainder of the report uses the December 2013 population as its basis, although outcomes measures and costs are inferred from the equivalent 1990/91 population either according to level of risk or target populations, as defined in the next section.


  • [18]Note that the high-risk population generally refers to the population meeting at least the definition of high risk, and includes those identified as being at extreme risk.
