Making Words Count
A tipsheet by David Poulson, associate director, Knight Center for Environmental Journalism
When Darren Samuelsohn heard “global climate change” during January’s State of the Union address, he suspected it was the first time the president had uttered the phrase in any of his previous annual assessments of the country.
The Greenwire senior reporter returned to his office and verified his hunch by combing through the six others. His story was the first to lead with the fact that 2007 was the first year Bush mentioned global climate change in a State of the Union.
“This was a big deal,” Samuelsohn said. “While Bush may not have made any major policy reversals on mandatory caps, it put him on record on national TV and before the new Democratic Congress as saying this is a priority for his administration.”
It took Samuelsohn about 30 minutes to cut and paste the texts of the past speeches into a Word document and scan them to make sure he was right. But there is an easier way for reporters on deadline to count how often words appear in the State of the Union.
What’s more, there are easy techniques for quickly counting the incidence of certain words in speeches given by anyone – your state environmental department chief, the leader of an environmental group, the mayor, school superintendent, police chief, governor.
It’s an analysis that may help a reporter read the tea leaves for shifts in policy or priorities. At a minimum, it provides a fun entry point and fodder for a graphic to spice up what may be a dull speech story.
For the State of the Union, check out http://style.org/stateoftheunion/parse/. It’s a nifty parsing tool for counting words. The side-by-side comparison of each of Bush’s speeches shows the evolution of which subjects are emphasized. Check out words like terror, terrorism, Iraq and war to see how often they appear each year.
You can do the same thing with environment-related words and phrases – energy, ethanol, pollution, nuclear power, global warming. Or contrast the incidence of words like war and peace or drugs and education.
That’s pretty nifty. But most reporters have greater need for analyzing local speeches. There are two techniques for doing this quickly. One involves simple spreadsheet skills. The other uses a speedier Web-based tool, but you don’t get the satisfaction – and the security – of doing the work yourself.
First, the spreadsheet technique:
1. Paste the text into Microsoft Word. You may want to highlight it and go to edit/clear/formats to get rid of extraneous formatting, particularly if you copied it off the Web.
2. Call up the search and replace function (control f on PCs; open-apple f on Macs) and replace each punctuation mark with nothing by leaving the “Replace With” box empty.
3. Now replace spaces (hit the spacebar once) with paragraph marks (^p). That will put each word on a separate line.
4. Paste the result into a Microsoft Excel spreadsheet under a column labeled Words.
5. Run a pivot table to find how many times each word appears. Sort the results by descending order. Don’t be intimidated by pivot tables. Just:
– Highlight the entire column including the header and go to Data/PivotTable and PivotChart Report.
– Click the “next” button in the first wizard window. Click “next” in the next dialogue box. Now click the “layout” button.
– Drag the “words” button into the row area of the chart. Again drag the “words” button but this time drop it into the data area. It will change to “count of words.”
– Click OK and then finish. To put the incidence of words in order, double click on the gray box behind the word column header. Click on advanced. Under “AutoSort options,” check descending. Under “Using field,” click on the drop-down arrow to sort by “count of words.” Click OK and OK again. The most frequently used words appear at the top of the list.
6. Just ignore words like the, and, or, it, they, he, she and others that are not so interesting. Or use search and replace to get rid of such words before pasting your text into a spreadsheet.
For a faster automated process, go to http://www.georgetown.edu/faculty/ballc/webtools/web_freqs.html. Just paste the text into this tool developed by Georgetown University and let it rip. You can sort alphabetically or by frequency.
If you just want to count the incidence of a particular word or phrase, you can always do a search for it in Word and replace it with something else. A dialogue box will tell you how often the substitution was made.
There is a legitimate argument over whether the number of times something gets mentioned in a speech represents the priorities of the speaker. A word count might be an objective indication of emphasis and perhaps policy shift. But you’ll need your reporter’s brain to provide the proper context.
And it doesn’t all have to be heavy duty analysis. Use the same techniques to find out how often someone’s favorite buzz word or phrase pops up in a speech. Even they may be surprised at how they litter their prose with the same words.
Word counts lend themselves well to graphical presentation. The New York Times used circles of varying size and divided them into categories – domestic affairs, taxes and the economy, terrorism and foreign affairs – to visually depict the frequency of words used in the 2007 State of the Union. In 2004, the Times used similar circles to depict the incidence of 20,000 words spoken by politicians at both party conventions.
“It doesn’t take a rocket scientist to look at one of these circle charts and figure out what a politician’s priorities are by the words they use,” said Karl Gude, the former information graphics editor at Newsweek who now teaches at Michigan State University. “And that’s just what I love about them. They convert a daunting amount of data into a simple and instant read.”
If nothing else, counting words is a lot more interesting than the old staple of counting how often a speech is interrupted by applause.
David Poulson teaches environmental journalism and computer-assisted reporting at Michigan State University’s Knight Center for Environmental Journalism.