Friday, 1 June 2012

On subnational data - pick your black box

South Africa faces significant challenges such as a low economic growth rate, high unemployment rate, high poverty rate and substantial inequality.  I often argue that these problems and their possible solutions have a spatial dimension that is neglected.  But, to support local economic development the public and private sectors require access to reliable sub-national data.  Statistics South Africa collects and disseminates socio-economic data, but information about local economies is limited to two private sector databases: Global Insight's REX and Quantec's Regional indicators.Recently, one of my Master's students set out to compare the two databases and we found some interesting differences.

The first thing to note is that in both cases the data are derived or imputed. This in itself is not a problem - it is also the case with for example the EU's NUTS-3 data - but the questions are about the amount of source data that do exist and the assumptions made to generate economic data at municipal level. It has been said that sub-national economic data in South Africa are not suitable for dynamic analysis because it is generated from aggregate GDP figures on the basis of a static algorithm. Our look at the data did not find simple disaggregation of official national or provincial total to the municipal level based on some or other fixed proportion, or fixed growth rates over time. We did find some interesting differences in, for example, population numbers.

There is hardly any way of knowing which is more correct, so for the economic data we argued as follows. If you subscribe to the idea that agglomerations of economic activity are characterised by cumulative causation and path dependency you would expect that over the short period for which there is data available, some places would grow faster and others slower than the national average but there would be persistence in relative positions and ranking. This is typically what the databases show. There is a lot more in the dissertation about the growth rates of GVA and different places' share of GVA, but the table below gives a brief summary of a test of rankings.


Each database shows internal consistency, but there are large (and significant) differences in rankings of places' share of GVA between the two databases.

Our conclusion: There is no evidence that the private sector databases are a simple breakdown of national or provincial numbers. There are no exploding standard errors. But the databases are black boxes and they differ substantively. They should not be used together. It is a question of picking your black box. 

What we need is an academic, open source dataset - a resource that can be vetted, applied and improved by all users.

1 comment: