Don't let statistics scare you

By Tony Van Witsen

Tony Van Witsen

Tony Van Witsen

Statistics are an essential part of journalism yet it’s surprising how often journalists claim to hate numbers. Or so the myth says.
In actual fact, one recent survey showed journalists’ confidence in their mathematical ability was about average–neither especially high nor especially low. (See? You can’t even talk authoritatively about how journalists feel about numbers without resorting to even more numbers.)
Data is everywhere these days, and cheap computing power makes it accessible to journalists who increasingly mine public databases for news.
Regardless of journalists ‘personal preferences, the profession seems to be moving inexorably toward a greater and greater use of data and statistics in reporting.
This is one focus of my current research: How do journalist think about and use numbers and statistics in their daily work?
The reliance on numbers is especially true in environmental reporting. Environmental problems are almost always based on measured, scientifically verifiable evidence, and reporters who cover these issues have no choice but to understand how they work.
So I had a special interest in attending the Society of Environmental Journalists’ craft tutorial “Data Journalism: How to Find It, Mine It, Animate It.” This session at the organization’s recent annual conference seemed aimed at those who need to master big data without knowing quite how to go about it.
Beginning with the basics of how to find data (or request it when not already public), this three-hour session took journalists through the process of handling data in Google Sheets (a kind of shared version of Excel), then progressed to use of public search programs such as Google Scholar, Google Trends and Google Knowledge that assist with data searches by providing a visual structure of how data sources are linked.
Using their laptops, participants learned by contributing to a group project to find data on environmental topics.
We started with the EPA’s rich range of data, including its toxic release inventory dataset and its greenhouse gas reporting program. This trove doesn’t just sit on a website but can be imported into the user’s own computer where it’s possible to extract new insights — and stories — by comparing EPA data with other data sets.
One of the most valuable insights from the workshop consisted of examples of data-based environmental stories and how they were done, which participants could take home and build on in their own newsrooms.
Another tutorial offered insights into how to think usefully about data. This problem surfaces in my research again and again.
Numbers can look so impressive, so coldly and neutrally authoritative in a spreadsheet that many journalists fall into the habit of thinking they represent some kind of transcendent “truth” rather than products of an imperfect human-created system for counting things.
It’s important to apply journalistic skepticism in such cases and ask, ask, ask.
Where did the numbers come from? How were they collected? How were the categories defined and why were they defined that way? How complete are the numbers? What’s missing? What wasn’t reported because it wasn’t asked? How different is this database from another one and do they overlap? If the information I’m seeking is missing, is there another way to find it?
These basic reporting questions can easily get lost, especially for the innumerates or numberphobics who might be so overawed they cannot recognize numbers as useful, but flawed human artifacts. Seen in this light, the session encompassed ideas and debate as well as technical skills.
My own research has shown me that neither the optimists (who see a golden age of data journalism around the corner) nor the pessimists (who think journalists always tend to mess numbers up) have it entirely right.

