22 free tools for data visualization and analysis
What it does: Google Refine can be described as a spreadsheet on steroids for taking a first look at both text and numerical data. Like Excel, it can import and export data in a number of formats including tab- and comma-separate text files and Excel, XML and JSON files. (Update 8/27/13: This project has been turned over to the open-source community and is transitioning to OpenRefine.
Refine features several built-in algorithms that find text items that are spelled differently but actually should be grouped together. After importing your data, you simply select edit cells --> cluster and edit and select which algorithm you want to use. After Refine runs, you decide whether to accept or reject each suggestion. For example, you could say yes to combining Microsoft and Microsoft Corp., but no to combining Coach Inc. with CQG Inc. If it's offering too few or too many suggestions, you can change the strength of the suggestion function.
There are also numerical options that offer quick and easy overviews of data distributions. This functionality can reveal anomalies that might be the result of data input errors -- such as $800,000 instead of $80,000 for a salary entry, or it could expose inconsistencies -- such as differences in the way compensation data is reported from entry to entry, with some showing, say, hourly wages and others showing weekly pay or yearly salaries.
Beyond data housekeeping, Google Refine offers some useful analysis tools, such as sorting and filtering.
What's cool: Once you get used to which commands do what, this is a powerful tool for data manipulation and analysis that strikes a good balance between functionality and ease of use. The undo/redo list of every action you've taken lets you roll back when needed. And text functions handle Java-syntax regular expressions, allowing you to look for patterns (such as, say, three numbers followed by two digits) as well as specific text strings and numbers.
Finally, while this is a browser-based application, it works with files on your desktop, so your data remains local.
Drawbacks: Although Google Refine looks like a spreadsheet, you can't do typical spreadsheet calculations with it; for that, you must export to a conventional spreadsheet application. If you've got a large data set, carve out some time in your day to go through all of Refine's suggested changes, since it can take a while. And, depending on the data set, be prepared when looking for text items to merge: You're likely to get either a lot of false positives or missed problems -- or both.
Skill level: Advanced beginner. Knowledge of data analysis concepts is more important than technical prowess; power Excel users who understand data-cleaning needs should be comfortable with this.
Runs on: Windows, Mac OS X (if it appears to do nothing after loading on a Mac, point a browser manually to http://127.0.0.1:3333/ ), Linux.
Sometimes you need to combine graphical representation of your data with heftier numerical analysis.
What it does: R is a general statistical analysis platform (the authors call it an "environment") that runs on the command line. Need to find means, medians, standard deviations, correlations? R can handle that and much more, including "linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering and smoothing," according to the project website.
R also graphs, charts and plots results. There are numerous add-ons to this open-source project that significantly extend functionality. For users who prefer a GUI, Peter Aldhous, San Francisco bureau chief for New Scientist magazine, suggests RExcel, which offers access to the R engine through Excel.
What's cool: There is a great deal of functionality in R, including quite a number of visualization options as well as numerical and spatial analysis.
Drawbacks: The fact that R runs on the command line means that users will have to take the time to learn which commands do what, and not all users will be comfortable with a text-only interface. In addition, Aldhous says those dealing with large data sets may hit a memory barrier (if so, there's a commercial option from Revolution Analytics).
Skill level: Intermediate to advanced. Comfort with command-line prompts and a knowledge of statistics are a musts for the core application.
Runs on: Linux, Mac OS X, Unix, Windows XP or later.
- Securing Mobile App Data - Comparing Containers and App Wrappers Analysts agree that Mobile Device Management (MDM) is not enough when it comes to securing app data. Although it remains a critical component...
- IPv6 Fundamentals IPv6 is needed to sustain the growth of the Internet. The transition from IPv4 will require planning and likely some degree of support...
- Optimize IT Performance & Availability: Four Steps to Establish Effective IT Management Baselines More than ever before, your company's ability to grow hinges on IT performance and availability. Download this how-to report on establishing IT baselines,...
- Considerations for Embracing Wireless Monitoring Employee behavior is once again driving major changes for IT departments - this time it's BYOD. This report details three critical steps to...
- Live Webcast Master the Changing SAP Landscape with Performance Management SAP landscapes are not getting simpler. Gradually, business processes that used to be contained on a single SAP system now involve a range...
- Accelerate your innovation with IBM Bluemix™ Join us for a webcast introducing the new IBM BluemixTM. IBM Bluemix (www.bluemix.net) is a developer oriented Platform as a Service (PaaS) environment...
- On-Demand Webinar: Beyond the Enterprise App Store If you already have a few projects that are RED on your project timeline then join us so you can free up your... All Applications White Papers | Webcasts