IDG News Service - Google has updated and re-released open-source software for cleaning, analyzing and transforming data sets, now called Google Refine.
The software, originally called Freebase Gridworks, came with Metaweb, a company Google purchased in July.
Google Refine is a collection of tools that could come in handy when wrangling useful information from a data set, particularly ones that have data inconsistencies.
This desktop application can, for instance, find all the variant spellings of a word in a data set and replace them with the appropriate term. This process, called normalization, is nothing new. But normalizing data usually requires writing code that is specific to one data set, noted Christopher Groskopf, a developer for the Chicago Tribune.
"The genius of Gridworks is that it is generic enough to work for a wide variety of data sets without the need to write any code at all. Even better the resulting operations are portable, so the process used to clean up 2009′s data can be repeated for 2010," Groskopf wrote in a blog post.
The software contains a number of other tools as well. It includes an expression language that can be used to analyze a set of data. Filters can be used to isolate subsets of data, which then can be analyzed or changed through a set of transform commands.
The software can work with up to a few hundred thousand rows per data set, depending on the user's computer memory. And unlike most spreadsheet software, this software can interactively transform large subsets of data, the company asserted.
Google said this week that it has added several new features to the software, officially called Google Refine 2.0, including the ability to link records to other databases, and a number of new transformation commands and expressions.
The non-profit government watchdog organization ProPublica has used this software to aggregate data from seven different data sets to show how pharmaceutical companies pay doctors to recommend certain medications.
- Best iPhone, iPad Business Apps for 2014
- 14 Tech Conventions You Should Attend in 2014
- 10 Desktop Apps to Power Your Windows PC
- How to Add New Job Skills Without Going Back to School
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Addressing the Broken State of Backup with a New Category of Disk-Based Backup Solutions Today, IT organizations are faced with a number of challenges when managing backup processes, including the need for faster backup, restore, tape copy,...
- Optimizing Approaches to Enterprise Backup and Recovery IT organizations are faced with ensuring that backups occur in the shortest amount of time and are not operationally disruptive as well as...
- How Backup Disk Architecture Impacts the Backup Window This paper compares disk based backup architectures, the impact that data deduplication has on backup performance, and how well the solution scales as...
- How Data Deduplication Impacts Recovery Data deduplication has clear benefits when it comes to efficiently retaining backup data on disk and replicating data offsite for disaster recovery --...
- Pre-Engineered solutions from VCE Simplify Core Infrastructure Implementation In this video, the CTO of Purdue Pharma, a privately held pharmaceutical company explains how Purdue transformed their data center infrastructure with VCE.
- Data Protection and Disaster Recovery with iSCSI and VMware Get this on demand webcast now All Disaster Recovery White Papers | Webcasts