Doing data journalism as a think tank
This is a chapter, completed in August 2015, from Data Journalism: Inside the Global Future (Abramis).
It's shortly after 10pm on Thursday 7 May 2015, and the only chart that matters is the one being projected onto BBC Broadcasting House. The exit poll for the 2015 General Election has just rendered weeks of hung parliament speculation pointless, Harriet Harman’s party lines obsolete and Paddy Ashdown’s hat edible: the Tories are on course for a workable minority, Labour for a disaster, the Lib Dems for near wipe-out, and the pollsters for some psephological soul-searching. My editor calls: he’s seen this before. The result is only going in one direction – a Tory majority. Without the luxury of a couple of weeks of coalition negotiation, me and my team will be live-blogging the formation of the next government rather earlier than expected and will need to prepare for an early morning.
As David Cameron leaves Buckingham Palace at around 1pm the next day, Prime Minister with a Conservative majority, we get to work. For three solid days, we analyse the ministerial moves as Coalition gives way to Conservative government in real time. But there are two unusual things about our live- blogging exploits. The first is that we're doing the whole thing in charts – 225 of them, around nine an hour – all of them in Excel and many updated instantly as ministers move into – or out of – government.
The second is that it wasn't my editor that called, but our deputy director. We don’t have an editor, because we are not a news organisation. We are the Institute for Government. And we are a think tank.
Charting government – the Whitehall Monitor project
The Institute for Government is an independent charity, a stone's throw away from both Westminster and Whitehall, that aims to help make government more effective. One of the ways we do this – alongside various research projects, public events and development work with politicians and civil servants – is through our Whitehall Monitor project. Whitehall Monitor is essentially data journalism – we take data published by and about government to build up a picture (literally) of the size, shape and performance of government. We publish an annual report, regular blogposts and occasional special reports. We aim to improve the way government uses and publishes data, as well as increasing understanding of what government currently looks like.
This matters. When people think of government departments (if they think of them at all), it is probably what they have in common that stands out – the department of this or the ministry of that, based somewhere in London SW1, funded by our taxes with a minister around the Cabinet table. But departments vary hugely in what they are, what they do, how they do it and how well they do it, something we hope our work will help politicians, civil servants, journalists, civil society and the wider public appreciate.
Live-blogging government reshuffles is something of a speciality for us – we did it in July 2014 as well as the three days in May 2015. It’s a fantastic opportunity to use the political drama of election results and ministerial moves to interest people in everything from the importance of ministerial stability (moving junior ministers around can affect whether policies are actually implemented) to how much departments have changed since 2010 (a lot, in some cases). Our most retweeted graphic was one on the disparity between vote share and seat share:
But much of the content on the blog used ministerial appointments to highlight the challenges the ministers running their new departments would face, and what the departments they were inheriting looked like.
Think tanks and data journalism
In a sense, what we are doing is what think tanks have always done: conducting quantitative research and publishing it. But after releasing Whitehall Monitor 2013, we looked at the wealth of data and analysis in it and wondered what else we could to ensure it reached a wide audience and had some impact. We wanted our work to be not just on the web, but of the web. We spoke to a lot of people working with data. One remarked that what we were trying to do was akin to what 'some of the better newspapers are doing', and from then on, we consciously started to refer to what we did as ‘data journalism’.
The Guardian and the FT already had notable data journalism output. Ampp3d was pioneering punchy tabloid data journalism. Full Fact was charting as well as checking data. Buzzfeed was combining images with punchy titles and comment. In the US, Nate Silver’s FiveThirtyEight, Ezra Klein’s Vox, the Washington Post’s Wonkblog and New York Times’ The Upshot had a strong focus on data-driven ‘explanatory’ journalism.
Seeing what we were doing as ‘data journalism’ meant something different. It meant leading with some of the visuals and seeing them as stories in their own right rather than burying them deep in reports. It meant adopting more journalistic practices, such as working at speed and reacting to events. It meant connecting directly with the public, especially over social media, and aiming for more than just press coverage. And it meant shifting our entire organisational focus towards frequent and engaging blog posts rather than PDF reports (although hard copy annual reports can still reach an audience blog posts can’t).
‘Data journalism’ wasn’t without its critics. The US sites in particular faced criticism (some of it warranted) – a downside to the Upshot, if not unbridled FiveThirtyHate or a pox on Vox. But some of the problems they faced – caught between being too specialist (and therefore superficial to experts) or not specialist enough, the journalism being explanatory rather than investigative and proactive – were potentially advantages for us: we have an institutional expertise in the functions of government, and so much basic information about government was not being presented to a wider public. We also faced a possible internal problem: working in a different rhythm and using different skills to other projects in the Institute could have shut us off from colleagues. But our colleagues became really interested in what we’d done, and wanted to apply many of the same principles to their own work. We’ve now started rolling out data analysis and visualisation training across the Institute.
In September 2014, The Economist noted that think tanks, ‘the semi-academic institutions that come up with ideas for politicians’, were increasingly ‘doing journalism’. But it felt this was ‘more to promote ideas than to inform the public or expose wrongdoing’, and noted that ‘their policy papers are meant to be dry’. There’s no reason why any of that should be true. Indeed, a number of think tanks – such as Policy Exchange, The King’s Fund, the Resolution Foundation and the Institute for Fiscal Studies in the UK, and the Pew Center and Urban Institute in the US – have been increasingly using visuals to inform the public, as well as presenting and promoting their work, to great effect. These organisations may not see what they do quite so explicitly as ‘data journalism’ – but many of the practices are similar.
Armchair auditors, the fourth estate and ecosystems
Different think tanks will use different data sources and maybe create their own. We use open data from government (and the Office for National Statistics) for most of our work, helped by successive UK governments wanting to be a world leader in opening up data. Before he became Prime Minister, David Cameron looked forward to the 'army of armchair auditors' that would engage with expenses and government finances, and march towards a more transparent government and society. But, for the most part, the armchair auditors haven’t enlisted. Although it is a worthy aspiration that data should be published in a format anyone can use, there is no Excel field of dreams; you can build the spreadsheet, but they won’t necessarily come.
A better model would be to think of a new ‘fourth estate’, wider than just the press. The digital revolution has provided a far greater range of individuals and organisations with the opportunity and the tools to publish and do ‘journalism’ – it is no longer confined to those working for news organisations. Good journalism requires expertise, resources and time – something that think tanks and other civil society organisations (as well as some individuals) can bring to particular subjects, as well as journalism organisations.
It would be easy to get carried away with this idea. Traditional journalism – especially at a local level – is not in the best state, partly because of a failure to adapt to the digital age. A strong fourth estate should not be used an excuse for government to stop using this data itself, or to abolish scrutiny institutions (as it did with the Audit Commission). Talented data journalism graduates are still likely to look to news organisations not think tanks. Civil society organisations may come to data journalism with their own agenda (though so do traditional news organisations). Explanatory data journalism can only do so much – investigative journalism, still largely the preserve of news organisations, is still necessary to hold government to account.
And for all the creative opportunities – what Simon Rogers has likened to punk – there is a risk that those without the requisite expertise will produce work lacking in meaning and rigour. We are lucky at the Institute for Government to have built up a wealth of knowledge and understanding of government, which we can link to and draw upon, and ensure we ask the right questions of the data. Without asking the right questions and simply aggregating numbers to draw pretty but meaningless graphs, data journalism can be the plural of anecdote journalism. And even though most of our work is done through widely-available Excel and we build our spreadsheets to turn around analysis as quickly as possible, cleaning and checking the data can still take time. Data journalism may be easier than ever before, but it’s still not easy.
Nonetheless, there is now more data and more people and organisations with the expertise to do something with it. The right way of thinking about all this might be as an ecosystem – competitive and collaborative, with the different actors feeding off one another’s work and constant conversation about how to publish, visualise and use data. A government department might publish the data; a think tank may do something with it and publish it directly to the public; a news organisation might pick up something directly from the department or think tank; members of the public or other organisations may find things to improve; and so on.
Data, information and evidence
At the heart of our data ecosystem is the data published by government. The Coalition prioritised open data publication for three broad reasons – government accountability to citizens, improving public services and catalysing economic and social growth – and it is the first that we have been most interested in (though with obvious implications for the second). We've started to think about it in three ways: data as data (the raw material), data as information (turned into something meaningful) and data as evidence (actually using it for something). Using those headings gives a sense of where things currently are, how we feel our work relates to it, and where others may also be able to learn and contribute.
Data as data
The UK government is seen as a world leader in open data, and is publishing more data than ever before. The number of datasets published by data.gov.uk increased from 9,498 in June 2013 to 19,834 in March 2015. We wouldn't be able to produce a 150-page annual report without it.
Even where the quality of the data leaves something to be desired, it is still better that it is published at all, and will hopefully improve with use. The new Minister for the Cabinet Office, Matt Hancock, has suggested the agenda will continue, saying that 'without [open data] governments bury their heads in the sand'. However, there are definitely improvements that can, and should, be made.
More data could be published – for example, around contracts between government and independent providers of goods and services and around why departments' spending plans have changed. The quality of data can often be improved, such as the data on Civil Service professions. The lack of unique identifiers for things like government departments, or a 'canonical register’ in the current Government Digital Service lexicon, makes data work much more difficult than it should be. There are still accessibility problems – for example, too much data is still being published in PDF (aka Pretty Damn Frustrating) rather than in spreadsheets. Important data should be available to all users and not just the most technically- proficient ones – to those who think .json sang with Kylie, and that ‘scraping’ is something think tank researchers do to the barrel of bad jokes in writing book chapters on data journalism.
Running above all of this is how government organises the open data agenda. The appointment of Mike Bracken as the first chief data officer provides some opportunity to get this right. Continued political will is also vital. However, as Giuseppe Sollazzo, a former member of the Open Data User Group, has pointed out, the reduction in the number of open data advisory boards from four to zero over the last couple of years is not so positive – one hopes it is an outlier, rather than a trend.
As well as holding government to account for how it publishes data, we also need to think about how we do it ourselves. Many data journalism sites have been criticised for their lack of transparency and not publishing their data. We say that government should show their working, make their data more accessible and easier for others to use and generate their own insights, and publish so that others might improve the quality of it – we should practice what we preach.
We try to ensure our own data is open, reproducible, usable, updatable (i.e. ready to receive the latest data quickly), consistent, and portable. But most importantly, it should be published; to paraphrase George Orwell’s sixth rule of writing, break any of the above rules rather than doing something outright barbarous, which would not be publishing at all. No doubt we fall short of this on occasions and we could do more, but in bringing our own data out of the bunker we hope to encourage others to do so and improve the data we use.
A final point on open data. Data is not naturally occurring – when someone produces a dataset, they have chosen which data to collect and how to collect it. It will have inherent biases and limitations. So even though open data being published proactively is to be welcomed, it cannot replace people – whether journalists, academics or other organisations – going out to collect their own data to test hypotheses, or reactive requests, such as those under the Freedom of Information Act (FOI). Recent developments on Freedom of Information – the announcement of a review, and the transfer of the policy to the Cabinet Office – should therefore concern those who care about accountability and transparency. It is to be hoped the Cabinet Office does not become a FOI extinguisher.
Data as information
Publishing lots of data is one thing – making it mean something, quite another. In an age of 'infobesity', with so much data being published, it is possible for things to be hidden in plain sight. What is needed is for data – the raw material – to be converted into information – something that actually means something. For our purposes, turning data into information usually means turning it into a data visualisation.
Data visualisation matters. It can bring to life important numbers and trends that would otherwise have remained hidden in the main text or ignored in data tables. Visualising the data is often the best way of telling the story. As William Playfair, a founding father of modern data visualisation, noted, numbers in a table are often like 'a figure imprinted on sand... soon totally erased' while charts ensure ‘a sufficiently distinct impression will be made, to remain unimpaired for a considerable time, and the idea which does remain will be simple and complete’.
But throwing numbers into an Excel spreadsheet and generating a random chart isn’t good enough, any more than throwing a random selection of letters onto a page would qualify as a rigorous, well-written research report. Bad data visualisation can – unintentionally, as well as intentionally – distort, mislead, obscure, confuse and force the reader to do much more work than they should have to.
As a result, we’ve spent a lot of time refining our practices into an Institute for Government style guide, an evolving document setting out standards and practices for data visualisation across the organisation. It is necessary both for branding – it’s important our images are consistent and easily identifiable as ours – and for understanding – our graphs should be telling their stories as clearly and accurately as possible, and all of the design choices involved in a chart can help or hinder this. When we talk about ‘telling a story’, we don’t, of course, mean fiction – our style guide balances simplicity and clarity with accuracy and integrity.
According to one study, the average reader online will spend around 15 seconds on a page. Simplicity and clarity should therefore allow a reader who spends 15 seconds looking at one of our graphs to understand it. A good ‘Twitter test’ is whether the image, tweeted out and divorced from any deeper analysis, would make sense to a reader all by itself. This (usually) means only showing or telling one clear story per graph. For example, rather than trying to show both current position and change over time in the same clustered bar chart:
We would show the current score in a bar chart, and the change over time in a separate ‘coloured spaghetti’ chart drawing out some key stories:
Or a series of small multiples showing the change for each department:
Everything on the chart should aid the story, from the way the data is ordered (e.g. largest to smallest bar), to the title, to the minimising of gridlines, labels and other clutter. We try to let the data breathe – as Edward Tufte writes, ‘Above all else, show the data’. Data visualisation shouldn’t be about showing off the fancy things your computer can do, showing off your intelligence to your reader or neglecting the data at the expense of making something beautiful but meaningless. You want your reader to understand you – data visualisation is, in Alberto Cairo’s expression, a ‘functional art’.
Another of Orwell’s six golden rules for writing was to avoid cliché. Data visualisation often benefits from the opposite – conventional presentations, such as bar or line charts, are easily understood. But there are exceptions. Not all conventions should hold – if pie charts are the answer, you’re usually asking the wrong question (unless the question is ‘what form of chart is overused, often distorts the data and can usually be replaced by something more useful?’). And using an unconventional and arresting chart can make most subjects engaging. Our most retweeted graphic is one on job grades in various government departments – the unconventional visualisation catches a reader’s attention and draws them in.
As for accuracy and integrity: the IfG has worked hard to develop a reputation for rigour, and a serious error could threaten this. We check our work as thoroughly as possible to avoid this – following data points from charts all the way back to the source, developing a list of 'gotchas' or common pitfalls to watch out for, publishing all of our working and being clear about any caveats. Where the data we are working with has problems (e.g. gaps or unreliability), that becomes part of the story. Data should not be distorted – whether intentionally to fit predetermined narratives, or unintentionally through bad design. We also have a ‘Twitter test’ for this: if the chart were retweeted without any further context, would it stand, accurately, by itself?
When we get all of this right, the data is turned into something meaningful. This can require a lot of work – thinking about what visuals are likely to work before a data release, making sure our spreadsheets are set up in a way to generate them quickly, and iterating to see what works and what doesn’t. In our reshuffle live-blogs, for example, we made sure data could be updated and the right graphs produced instantaneously, as well as having prepared lots of graphics – on previous ministries and giving context on individual departments – in advance.
But there is a further step that can make data – and data journalism – even more powerful.
Data as evidence
The 'stat' or ‘performance stat’ model of government, associated with politicians like Maryland’s former Governor, Martin O’Malley, has become popular in the United States.
This has involved counties, cities and even states using data to understand what's going on, to set benchmarks, baselines and targets for public services, and to hold people to account, as a basis for running their administration. Data collection and publication is not just a 'dog and pony’ show for the sake of it, but is actually used to govern. Data is only powerful when it is actually used – something which doesn’t always happen in the UK.
In our work, we have used data to go beyond the explanatory in a number of ways. For example, we can see whether the government is on course to meet its 'expectation' in reducing Civil Service staff numbers:
It isn’t, which surely signals the difficulty ahead in reducing the size of the state further.
We’ve also used government data to try to work out the impact that government departments had on important policy areas between 2010 and 2015 according to their own ‘impact indicators’ – had they moved in the right direction?
We found that for some departments (like Energy and Climate Change), they had – for others (like Defence), they hadn’t. But the bigger discovery was that, given the quality of much of the data (where it was published at all) and the lack of baselines and benchmarks, it was highly unlikely that government was actually using the data to hold departments to account. This highlights not only that government could be using data in a much more powerful way, but also the benefits of data journalism in holding government to account.
Over the next few years, we hope to use data – and data journalism – much more in this way, and to encourage government to improve how it publishes and uses data to make government more effective. Hopefully, by the time we have to live-blog after the next General Election, we’ll have much more evidence of how government is performing and more evidence that it is using data to drive improvement.
Conversely, and for all the excitement about producing data journalism as a think tank, we might also hope that we’re not talking about either data journalism or think tank data journalism as such an innovation. Hopefully, data journalism will be simply part of regular journalism – and journalism something that is much more mainstream for organisations like ours.