Google BigQuery update aims for enticing Hadoop users
BigQuery users can now combine query results from multiple tables
IDG News Service - Hoping to lure more Apache Hadoop users to its own data analysis services, Google has outfitted BigQuery with the ability to query multiple data tables.
"Joining terabyte-sized tables has traditionally been a challenging task for data analysts, requiring sophisticated MapReduce development skills, powerful hardware, or a lot of time -- often all three," wrote Ju-kay Kwek, Google BigQuery product manager, in a blog post announcing the update. "Today with BigQuery you can get directly to business insights using SQL-like queries, with far less effort and far greater speed than you could before."
Google also argued that using BigQuery instead of a Hadoop deployment will save users money, because they only pay for the queries that are processed, rather than pay for the computational costs of running individual Hadoop supporting components.
Launched in 2010, BigQuery has been marketed by Google as an interactive service for parsing large amounts of data. With BigQuery, a user submits a data set to Google, then can query the data through the BigQuery API (application programming interface).
The new updates expand capabilities BigQuery already has in place. Most notably, a new JOIN clause that combines the results of a query across multiple data sources. Prior to this update, BigQuery's JOIN clause could only work with a data set less than 8MB in size. The new clause, JOIN EACH, has no limit on the size of the data.
As a result, the service can now be more effectively used as a replacement to Hadoop's MapReduce. Many Hadoop jobs are designed to bring together large amounts of data from two or more data sets. To do this however, developers must write MapReduce processes from scratch, which can be time consuming. JOIN EACH can produce a single result set from two large database tables that share a common key.
"With these capabilities, you will now be able to join and perform aggregate analysis on multi-terabyte datasets using SQL-like queries or integrated [third] party tools, instead of having to initiate complex coding projects," wrote Michael Manoochehri, Google's cloud platform developer programs engineer, in a technical blog post explaining the update.
BigQuery also now offers a better way to group query results as well. The GROUP BY EACH statement increases the number of distinct entities that can be grouped in a result set, though at a potential cost to processing performance.
The BigQuery update includes a couple of other new features as well. The service has more supports for timestamps: BigData can now import timestamps from other systems, as well as query timestamp data. Users can now add columns onto existing tables. Users can now also bookmark the specific datasets they have access to, as well as receive automated emails when they have been given access to a new dataset.
Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com
- 10 Hot Big Data Startups to Watch
- 11 Unique Uses for Google Glass, Demonstrated by Celebs
- How to Export Your Google Reader Account
- How to Better Engage Millennials (and Why They Aren't Really so Different)
- Telltale signs of ATM skimming
- 20 security and privacy apps for Androids and iPhones
- Big screen con artists: 7 great movies about social engineering
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- ESG Lab Validation of QLogic's Caching SAN Adapter ESG details the results of their testing of QLogic's new 10000 Series 8Gb Fibre Channel Adapter with a focus on scalable database performance...
- Sepaton Boosts Performance and Connectivity Options Senior ESG analyst Jason Buffington and Research Analyst Monya Keane describe the Sepaton S2100-ES3 Series 2925 data protection appliance (version 7.0) for large...
- Sepaton S2100-ES3 for Enterprise & Government Data Centers Find out how Sepaton meets these challenges and delivers the industry's lowest TCO.
- Big Data Finds the Perfect Backup Fit in Sepaton S2100-ES3 Download this independent whitepaper today by DCIG lead analyst Jerome M Wendt and examine why the Sepaton S2100-ES3 offers more performance and new...
- Live Webcast
Bring Mobile Innovation to your Enterprise. - With the mobility revolution well underway, CIO's and Line of Business owners are faced with the struggle to develop a winning mobile strategy.
- 3 Reasons Why Sepaton is the World's Fastest Backup Solution Leading analyst, Storage Switzerland learns how Sepaton backs up and deduplicates massive data volumes while maintaining the industry's fastest performance - all in...
- Bring Mobile Innovation to your Enterprise. With the mobility revolution well underway, CIO's and Line of Business owners are faced with the struggle to develop a winning mobile strategy. All Data Center White Papers | Webcasts
Rising salaries boost IT optimism, though not everyone is feeling upbeat. Our survey of 4,000+ IT workers shows who's riding the wave and why. Use our interactive tool and compare your own paycheck. Read more...