Cool data tools and resources from NICAR13

Do you want to find and tell compelling stories that are hidden in mounds of data? Then you've got a lot in common with attendees at last week's National Institute for Computer-Assisted Reporting conference, a gathering of developers, data analysts, designers, writers and editors who were sharing tips, tools and resources for dealing with data.

Some highlights from the four-day nerdfest in Louisville:

Data analysis with R

Why use a command-line tool to analyze and visualize your data? Reproducibility is one good reason: If you're got all your analysis steps saved in code, you can easily go through those same steps again -- on the original data set or any other similar data, said Hadley Wickham, chief scientist at R Studio and creator of the popular ggplot2 visualization plug-in for R.

If you're interested in learning R, check out Wickham's slides and code from a day-long NICAR class, available at http://bit.ly/nicarnerds. R is an open-source data-analysis platform formally known as the R Project for Statistical Computing,

Google Fusion Tables as a free back-end database

Google's Fusion Tables tool has long been used for mapping, but recent additions to the free service make it useful as a back-end database for sharing data among teams. You can filter data, summarize with results similar to a pivot table (from the main menu, use Tools -> Summarize) and create "card" views of your data with the option of changing which fields display and how.

For more, see Data sharing, visualization and app building with Fusion Tables by Google's Sree Balakrishnan.

In addition, Fusion Table's mapping tools are also more robust, with a Fusion Tables Layer Wizard that can include SQL-like searches on your maps.

Mapping

And speaking of mapping, there was some great advice on how to map data -- not simply what tools to use and how to plot your points and polygons, but important considerations on how to analyze your data before tossing it onto a map. Two of the most useful presentations and posts:

Mapping Best Practices -- offers advice such as: Just because you can make a map doesn't mean you should. In other words, not all data with a geographic component is most helpfully displayed by plotting on a map. And: In many cases, you'll probably want to avoid mapping raw numbers onto regions but instead want to standardize in some way. For example, mapping crime rates usually makes more sense than total numbers of crimes per region. Otherwise you may simply be creating a population density map. By Dave Cole (MapBox), John Keefe (WNYC) and Matt Stiles (NPR).

Talking about crime: The Chicago Crime site -- explains how the Chicago Tribune deals with mapping crime data. "Interpreting crime data is tricky business, and developing coherent narratives and useful metrics is even harder," the post notes -- along with an outline of decisions they made on presenting that data. By David Eads on the tribapps team.

Analyzing influence

NodeXL is a free Excel plug-in for social network analysis -- not seeing who's connected on Facebook or Twitter, but looking at a group of people to find the top influencers and who's got the most connections. Presenter Peter Aldhous from New Scientist showed an example of stem-cell researchers referring to each other's work in papers, and then had session attendees analyze U.S. Senate voting data from 2007. The networking graphing exercise featured data on how each senator's votes compared with every other senator's. Results? Senators clustered in two groups by party, except for three in between: Arlen Specter, Susan Collins and Olympia Snowe. If you'd like to try that exercise, see Aldhous's NodeXL for Network Analysis PDF. Data is available here.

Fusion Tables also has an experimental Network Graph capability. Opinions differed at the conference on which was the better tool -- in general, it appeared that NodeXL is more robust but Fusion Tables is simpler to use. There are some sample exercises on the Fusion Tables Network Graph help page if you want to start with the simpler tool.

Monitoring Congress

How often is a specific word or phrase mentioned in the U.S. Congress? New York Times developer Derek Willis showed a fun dashboard app he coded that collects all the references to the New York Times in the Congressional Record. Even better, he open-sourced that Paper of (The Congressional) Record code, so you can run your own version looking for different search terms than "New York Times." It's written in JavaScript with data from the Sunlight Foundation's Capitol Words API.

For lots more data tools from prior NICAR conferences, head to my chart of 30+ free tools for data visualization and analysis, sortable by category, platform and skill level needed. For links to more presentations from this year's conference, see Tools, Slides and Links from NICAR13 by attendee Chrys Wu.

FREE Computerworld Insider Guide: IT Certification Study Tips
Join the discussion
Be the first to comment on this article. Our Commenting Policies