Data, transparency & corruption


[For the main story, click here.]

[Database update 1/22/2016]

Something I really like about data-driven news site FiveThirtyEight is its tendency to err on the side of transparency with the data it cites or creates. For example, in a recent interactive story about the evolution of language use on the website Reddit, the option to download the data on which each trend chart is based features prominently in a way that is rare for other news sites.

That scarcity, common elsewhere, makes a kind of business sense: It's born from a recognition of the fact that being the first-to-market, go-to place for certain information is a very good way to make money--i.e. the Bloomberg model. Citing yourself as a source is a kind of hybrid self-preservation and self-promotion that, to be fair, makes total business sense given that publicly accessible coverage from Bloomberg - much of which is truly excellent reportage done by talented, hard-working journalists - is in many ways advertising for the Bloomberg terminal service. Hence this little notice, which now runs at the bottom of every Bloomberg story:

But when a Bloomberg story, for instance, cites "Bloomberg Data" as the only origin for the data in one of its charts, it doesn't just reinforce the position of said financial news and data service as an authoritative source; it can also make verifying the factual assertions baked into that chart and the accompanying story nearly impossible. That obfuscation runs counter to the essence of public-facing journalism as an endeavor of verification.

The FiveThirtyEight approach I've described - while not practiced 100% of the time - is based on recognition of another fact: That making data available to everyone who might want it for further study or vetting can bolster the fundamentals of journalism as a profession by promoting greater accountability, more responsive coverage and deeper understanding. That's one reason we're making the data from our latest feature on China's anti-corruption campaign available for public download via Google Sheets:

[Click to access the Google Sheets file.]

Included are the major data series referenced in the article, as well as links to all the monthly and annual reports that form the basis of the database. It's not exactly groundbreaking by current standards, but CER has always been a bit slow to enter the journalistic present. Hopefully this helps push it another small step in the right direction. 

Author: Hudson Lockett (@KangHexin)