When analyzing data, actual analysis usually isn't your only task -- in fact, sometimes it's not even the most daunting. Acquiring, formatting and removing errors from data can pose significant challenges (not to mention take up lots of time).
Quandl, a small Canadian startup, aimeds to ease those tasks by only aggregating data, but making it analysis-ready out of the gate.
"I have probably spent weeks of my life trying to find data on the web," founder Tammer Kamel writes in a blog post at Revolution Analytics. "And several more weeks validating, formatting and cleaning the data. "
"We've built a sort of 'universal data parser' which has thus far parsed about 2.8 million datasets," Kamel explained in his post. That's created a "sort of search engine for numerical data. The idea with Quandl is that you can find data fast. And more importantly, once you find it, it is ready to use. This is because Quandl's bot returns data in a totally standard format. Which means we can then translate to any format a user wants."
Quandl started by providing data in Excel, CSV, XML and JSON formats. They've got a beta add-in for Excel that allows you to pull one of their data sets directly into your spreadsheet (free API token needed) and just launched a package for R, the open-source R Project for Statistical Analysis platform. "Python, Ruby, Matlab, and Stata are next," Kamel told ProgrammableWeb.com. They also have an API so you can pull the data into your own applications -- there's even an "API call" option with every data search result, making it easy to see how to use the API to pull that specific data.
The site specializes in data that is collected over time and so far has data categorized as "financial, economic and social," such as stock and commodity prices, unemployment rates, crime rates and populations. There are even some basic sports stats, although generally not beyond compiling a historical record of what you can see in an expanded standings table with things like wins, losses and points/goals/runs for and against.
Quandl acquires its data by having human curators point its bot at specific data sets (as opposed to simply spidering the Web at large). They will soon be looking for volunteer curators similar to Wikipedia's volunteer editors.
"I feel like I have been handed a surfboard to handle the data tsunami," one commenter wrote in a Google+ statistics group focused on R.
Quandl joins a space similar to Datamarket, the Iceland- and Cambridge, MA-based data company that offers both free and premium access to data and visualization tools. Datamarket for now seems to have more of a focus on end-user interactions and visualizations as well the raw data itself. For example, its Housing Price Index by State lets you select by state and add more than one item to a graph. And it allows paid users to embed its graphics on external websites.
Update May 9, 2013: Quandl now allows any of its data visualizations to be embedded on external websites.
A free account is required for any data download from Datamarket; it is not at Quandl. Another difference: Datamarket currently charges for API calls after the first 50 per month while Quandl's is free. Both Quandl and Datamarket allow download in R console code format.
"I do think we're both providing data offerings that differ enough that we both can provide unique value to data users," Kamel told ProgrammableWeb.com when asked about Datamarket. "If things continue to go well for us I feel there's more than enough room for the both of us in the market, and I think it is the big, old 'closed data' companies that we can disrupt."
It remains to be seen whether there's room enough for two such data search engines. But for the sake of all of us who scour and clean data, I wish success -- as well as lots more data and features -- for them both.