Thursday night saw the great and the good, the interesting and interested, of UK planning come together for the 14th Google Firestarters on the theme of big data. Tim Harford, the renowned author, broadcaster and FT columnist, split his provocations into two TED style twenty minute talks (with discussion and questions in between) - one titled ‘Big Mistakes with Big data’ and the second on ‘How to Tell the Future’. Tim spoke without slides and was an excellent storyteller, effectively combining great anecdotes with a deep knowledge of the subject and some great insights into a complex and over-hyped subject. I have to say, I thought he was quite brilliant.
He began with one of the original and most emblematic poster-children for big data - Google Flu Trends. A number of years ago a team of researchers from Google announced (in one of the world’s top scientific journals) that they were able to track the spread of influenza far faster than the Centers for Disease Control and Prevention could by tracking search terms, and recognising that there was a correlation between what people searched for and whether they had flu-like symptoms.
Flu trends was built on so-called ‘found data’ (the digital exhaust of user behaviour online) which underpins much of the new internet economy. The promise of big data is that every single data point can be captured (what Tim termed as 'n=all') making older sampling techniques obsolete, that analysis of this data can produce highly accurate results, and that statistical correlation can tell us all we need to know so we should worry less about understanding causation or generating theories to test.
The problem with this however, is that it is often an optimistic oversimplification, and that ignoring 'old-fashioned, boring lessons about the way we should behave with data' creates potentially significant problems. And if we don't have a theory, we're not able to say what went wrong with that theory. Often we might have n=all, 'but not the n=all that matters'. Tim used several entertaining case studies (including referencing how Target figured out a teenage girl was pregnant before her Father did, the high profile story that seemingly showed the power of big data but failed to acknowledge how many similar assumptions had been made about teen girls that weren't pregnant) to show how not accounting for biases can lead to some pretty big, eroneous assumptions. When after several reliable winters in accounting for flu outbreaks Google Flu Trends suddenly became less dependable, it was because a reliance on correlation alone underplayed what linked the search terms more directly with the spread of flu, or causation (which is inevitably much harder to get to). Google flu trends has since been recalibrated with fresh data and bounced back, but it illustrates that a reliance on correlation alone can be fragile.
Tim finished his first talk by speaking about multiple comparison problems ('ignoring the differences in differences'), which some have said mean that up to half of academic papers in some fields could be wrong. The solution, he said, is transparency in data sets.
In his second talk, Tim talked about 'how to see into the future', telling a fasinating story around the history of prediction and what that means for our own attempts at future-forecasting. Billions are spent on forecasts but many 'experts' get it wrong, and yet it is possible to make sensible predictions about the future. Tim told the absorbing story of two of the most famous economists in the world - Irving Fisher and John Maynard Keynes who both believed they could make a lot of money from economic forecasting but had very contrasting fortunes. Both were caught out by the Wall Street crash of 1929, but whilst Keynes recovered and became rich, Fisher stubbornly believed in the veracity of his forecasts of recovery and continued to invest into the markets, eventually dying in poverty.
So forecasting is tough, but one of the biggest problems with it is that we don't keep score. In 1987 Philip Tetlock (a Canadian Born psychologist) 'planted a time-bomb' under the economic forecasting industry that wouldn't explode for another 18 years. Tetlock was at the time looking at what social sciences could contribute to preventing a nuclear apocalypse but rapidly became frustrated by the many contradictory positions taken by so-called experts, how stubborn these experts were in retaining their point of view even in the face of contradictory evidence, and how easy it was to justify even failed forecasts (with extenuating circumstances or excuses). Tetlock's response was to collect forecasts from hundreds of experts (eventually accumulating 27,500), establishing some clearly defined questions that would enable him to clearly say whether they were wrong or right, and then he waited for 18 years.
The results, published in Expert Political Judgement, showed that most of these experts were terrible forecasters. But refusing to acknowledge that the world is simply too complex to forecast, Tetlock last year set up a new, and similar, research programme that aggregated a large number (20,000) of quantifiable forecasts (like 'a tournament with thousands of contestants'). The Good Judgement Project incorporates a broad variety of forecasters and also a series of experiments that have already shown that brief training on how to put a probability on a forecast and correct for well-known biases improves results, and also that working in forecasting teams (the most successful were put in a team to be able to discuss and argue) produces better predictions. Furthermore, the project has shown that forecasting can work, and that some 'superforecasters' have the ability to predict events with a degree of accuracy far outstripping chance. The project also found that the mosty successful forecasters were those that exhibited 'actively open-minded thinking' or in other words, those that were not afraid to change their minds in the face of fresh evidence, and are happy to seek out contrasting views.
There is plenty more on that last talk in this excellent FT column that Tim penned on the theme (£), but for an audience of planners who are often exposed to the promises that big data can yield and who are also perhaps mildly obsessed themselves with prediction and the future, it was a fascinating couple of talks with lots to draw on.
So my thanks as always to Google for hosting, to Tim for such great provocation, and to those who came along. There is a storify of the event you can see here, and you can see the Scriberia visualisation (below) of the talk in all it's glory here. The next Google Firestarters will be in early 2015 so if you'd like to ensure you get notified of when registration opens you can sign up for my newsletter for news of that.