Data Manipulation

There are usually a range of sources for population and employment data. Different sources have different levels of reliability, and in many cases, incompatible boundary regions. To make best use of this data, and to ensure the highest degree of compatibility between the disparate sources, TransPosition has developed a new computer algorithm to combine the different sources to produce population and employment series projections so that the data is consistent.

Firstly, a hierarchy is put in place based on our degree of confidence in each source; which were generally higher for the demographics contained in smaller regions. To explain this a little further, here is an example. The Australian Bureau of Statistics (ABS) defines its data by different statistical areas (SA). At the SA2 level the employment is spread over SA2 boundaries across the state. Therefore, within a SA2 region the locations of the specific population and employment areas is not known, the employment is merely spread evenly over that region. SA1 regions are smaller areas and so we get more confident about the locations that contain the population and employment and with the collector district we have even smaller areas that the employment and population data are located in and so this means we are even more accurate. The point demographics are the best estimate we have regarding the exact locations of employment for certain industries however assumptions usually have to be made for the actual employment numbers and so we need to constrain these numbers to the higher level regions.

In the model, each node in the network will have population and employment associated with it. As the model works at a point level, one point in a collector district may also be contained in SA1 and SA2 layers. Therefore, given the defined hierarchy mentioned above, we can now start constraining the data to get consistency across all the regions.

Example

We show an example here for Queensland, Australia.

The order of confidence is set out below with 1 being the best data we have.

  1. Toowoomba Regional Council (TRC) - collector district (CD)
  2. South East Queensland (SEQ) - SA1
  3. Queensland (QLD) - SA1, SA2
  4. Point Demographics

Collation process for employment

The steps implemented for combining all the layers together for employment is listed below.

  1. The TRC collector district employment projections gets subtracted from SEQ SA1 employment projections since we are most confident in the TRC data

  2. Then whatever is left of the TRC collector district employment projections gets subtracted from the QLD SA2 employment projections since some of TRC is not contained in SEQ

  3. Whatever is left from the SEQ SA1 employment projections then gets subtracted from the QLD SA2 employment projections. The QLD SA2 level projections is the level we are least confident about as SA2 is a larger area than SA1 and collector district.

  4. The point demographic employment sites first get constrained by TRC CD, SEQ SA1 and QLD SA2 (in that order) so that the employment numbers at each point are never higher than the employment in the larger regions. For example, if 50 mining employees at a site but TRC CD only had 20 mining employees then then this step would factor down the mining employees at that point to 20 to be consistent with the higher level (TRC CD) layer.

  5. After we trust the point demographic employment numbers more by constraining them with the higher levels of data, we then subtract the point demographic employment sites from TRC,SEQ and QLD as before.

Collation process for population

Combining all the layers for population is similar to the employment method, only simpler, as we do not have to deal with the point demographics. The steps implemented for combining all the information for population is listed below.

  1. The TRC collector district population projections gets subtracted from SEQ SA1 population projections since we are most confident in the TRC data

  2. Then the remaining TRC collector district population projections gets subtracted from the QLD SA1 population projections

  3. Whatever is left from the SEQ SA1 population projections then gets subtracted from the QLD SA1 population projections which is the level we are least confident about

Note that this approach is not guaranteed to give optimal projections but it is the best method we have to keeping consistency across regions and deal with the different data sources and confidence in this data. It is difficult to see how a more realistic approach could be developed without implementing a full land use supply/demand model which is clearly beyond the scope of this work.