Jonathan Stray is the man behind Overview, a project from the Associated Press that aims to help journalists find stories in large quantities of documents. The tool uses keyword searches to automatically sort documents according to topic, making patterns and trends far easier to spot. Jonathan is a fellow at the Tow Center for Digital Journalism at Columbia University, teaching and researching computational journalism. He developed Overview with a Knight News Challenge grant. Jonathan will speak in a session on successful tools and examples of investigative data journalism.
Overview is an impressive tool for journalists engaged in what you describe as “document-driven journalism.” Is it bringing stories to light that would otherwise have remained hidden? Are there any stories you are particularly excited about?
There’s a constant flow of stories based on documents. Right now, a coalition of journalists is going through thousands of documents left by former Ukrainian President Yanukovych. Last year, the International Consortium of Investigative Journalists began publishing their Offshore Leaks project based on 2.5 million leaked offshore tax documents. And of course we’re still seeing reporting from the NSA files.
Those are famous stories, but even small news organizations eventually have to deal with a mass of documents, often the result of a Freedom of Information request. Our data shows that the median document set size in journalism is 9000 pages, which would take weeks to read without a computer. This is why we created Overview. You can find stories in masses of documents within hours, even if you’re not sure what to look for.
How important is transparency in data-driven or document-driven journalism? Is it enough to share the findings, or should reporters share the original data/documents also?
Reporters should share their source material whenever possible because it lends credibility to their work and allows other people to build on it. We insist on open data from governments, so why shouldn’t we insist on open data from journalists? The DocumentCloud platform, which is free for journalists, is designed to make sharing your documents easy. But sometimes it’s not possible to share everything, because of security, legal, or ethical issues. Strangely, journalists must support both transparency and secrecy. Actually everyone who supports transparency has this problem — if you’re going to be more transparent you also need to be much clearer about what is really a secret, and why.
You discussed the idea of journalistic objectivity in a 2013 article. What do you think about the suggestion that “transparency is the new objectivity”?
Transparency is important, but it is not a complete replacement for objectivity. Objectivity is a complicated word that includes a lot of different parts. When people say “transparency is the new objectivity” they are getting at the idea that it’s ok to have a point of view in your reporting, and that it’s better to be honest about who you are and what you believe. I think there’s a lot of value in that idea, but it doesn’t mean that you can write anything you want. I think we all still insist on basic accuracy, for example. Also, transparency doesn’t mean you get to leave out the parts of the story that you disagree with, or use quotes in a way that twists their meanings.
So transparency alone is not enough. We also need things like accuracy, comprehensiveness, and fairness. I wish people would stop debating the meaning of the word “objectivity” and instead talk about the many qualities that we expect in good journalism.
Have you personally faced dilemmas over transparency in your work as a journalist?
Yes. As a journalist serving the public it’s my job to get as much on the record as possible. As a human being talking to a source, you quickly realize that certain information might be damaging. These are often in conflict, and I’ve had to make choices about what is important.
How significant a role would you say that online comments and social media have had in making journalists more accountable to their readers? Do you think one has played a bigger role than the other?
I view online comments and social media as different forms of the same thing — users able to talk back to the journalist, and to each other. I think this is incredibly valuable. One thing it has done is raise the standard for journalism, because there are always experts in your audience who will instantly see if you get something wrong. (This was clear even ten years ago, when Jay Rosen articulated the idea as “my readers know more than I do.”)
I think the journalist has two choices to deal with this. Either do very, very thorough work so you’re never wrong, or change the way you expect to interact with your audience. What makes the journalist so special? Why does their voice matter so much more than everyone else’s? Humility and the willingness to be honest about what you don’t know are important personality traits for the modern journalist.