There are lots of free data visualization and analysis tools out there -- so many that's it's getting tough to keep up with them all. But in an increasingly crowded field, it's hard not to pay attention when a service with the cachet of IBM's Watson joins the fray.
Watson Analytics aims to bring natural language understanding to data work. That means you don't need to structure a proper query in some specialized language to find relationships and patterns in your data; likewise, there's no need for knowledge of statistics to decide what results are significant and which are just noise. Instead, the system does all that in the background for you.
So, say you've uploaded a spreadsheet of data about your customers, including information about customer lifetime value and other factors such as where these customers were first acquired. Now you want to find out what's important about your data. Where to start? With Watson Analytics, you can type in a question such as "What influences customer lifetime value?" to find out which factors are most important in creating (or predicting) high-value customers. Results include visualizations as well as some key snippets of interest about your data.
You can also create visualizations with natural language requests, such as typing in "Customer Value by Acquisition Source" to create a bar chart of those two columns.
After uploading a data file (CSV or Excel files accepted), Watson Analytics lets you "explore," "predict" or "view" it -- and, coming soon, re-shape/refine it.
Note that "predict" here is used in the data-science context of "what factors are most likely to influence the value of a column of data I care about?" -- the way, say, the Obama campaign micro-targeted likely Democratic voters based on where they lived, what TV shows they watched and so on. Predict doesn't mean actually modeling future results, such as forecasting what next month's sales will be based on patterns from the last few years.
If you choose explore, Watson Analytics will suggest possible questions you might want to investigate. For a file of Baltimore city employee salaries I added to my account, Watson proposed a number of different starting points, such as: What is the trend of gross pay over year? What is the breakdown of annual salary by agency? What is the breakdown of annual salary by job title?
When I selected annual salary by agency, Watson showed me a nice interactive tree diagram -- with totals of all salaries in a department, which wasn't quite what I had in mind.
The good news: There was an easy way to change the default aggregation from sum to average, so I could look at the typical employee salary in each department and not each department's total payroll. The bad news: That same easy way to change default aggregation didn't include median.
Trying to predict what factors influence annual salaries didn't work out too well, as the system didn't find any useful predictors. So, I loaded another data set: a file of 50,000 or so diamond sale prices including factors such as clarity, quality of cut and various size measurements for each diamond (this is a file familiar to anyone who has tried to learn the R ggplot2 package, as it is included as sample data) .
Watson Analytics showed that x, y and z measurements along with carat drove a diamond's price at roughly 75% predictive strength. I was able to view the predictions in multiple ways, including a decision tree with five rules predicting highest prices.
When I wanted to visualize the data, I was invited to type in "my intent" such as "Price by clarity." Once I did that, up popped a bar chart similar to the one at the top of this page. There were a number of color customization options, although I didn't see a way to change that visualization type to some other kind of dataviz. Various chart/graph options are available for visualizing data when choosing the explore option, though.
Overall, the exploratory visualization interface appeared to be pretty polished, with menus to add columns, functions, filters and more. Yet there were still some basic things I found difficult to do, such as have all my tree-map tiles or bars in a bar chart show again after selecting one to view details.
This is still a service in beta. I tried the share option several times, but never did receive an email with my explore visualization as an attached image. And, even when I adhered to my account limit -- maximum file size of 0.4G and number of columns not to exceed 50 -- I received a "Maximum data source record count quota has being [sic] exceeded" error (the file was just 25M with 17 columns). Despite the numerous video tutorials as well as a documentation section, some more basic information will be helpful.
Bottom line? Watson Analytics is an intriguing first step in applying IBM's "cognitive computing" to the challenge of data analysis, although it's not ready to replace high-powered enterprise tools just yet. I plan to be watching as the service evolves.
Election data: One picture is worth ...Next Post
Snowmageddon: interactive maps for your blizzard enjoyment
Researchers at the University of California have discovered a way to use nanowires to allow lithium-ion...
Abbott Labs, a global healthcare company, is laying off about 180 IT employees after inking an...
Microsoft will stop manufacturing Surface 3 by the end of the year, which raises a big question: Will...
The Plantronics Voyager 5200 Bluetooth headset features excellent voice quality and a variety of useful...
Venturing into machine learning? These tools do the heavy lifting for you.
Creating charts that show what's important in your data takes skills beyond simply mastering...