Review: Google Bigtable scales with ease

If your data runs to hundreds of terabytes or more, look to Cloud Bigtable for high performance, ease of use and effortless scaling without downtime

When Google announced a beta test of Cloud Bigtable in May 2015, the new database as a service drew lots of interest from people who had been using HBase or Cassandra. This was not surprising. Now that Cloud Bigtable has become generally available, it should gain even more attention from people who would like to collect and analyze extremely large data sets without having to build, run, and sweat the details of scaling out their own enormous database clusters.

Cloud Bigtable is a public, highly scalable, column-oriented NoSQL database as a service that uses the very same code as Google’s internal version, which Google invented in the early 2000s and published a paper about in 2006. Bigtable was and is the underlying database for many Google services, including Search, Analytics, Maps, and Gmail.

Bigtable inspired several open source NoSQL databases, including Apache HBase, Apache Cassandra, and Apache Accumulo. HBase was designed as an implementation of Bigtable based on the paper and became the primary database for Hadoop. Cassandra was born at Facebook using ideas from Bigtable and the key-value store Amazon Dynamo. Accumulo is a sorted, distributed key-value store with cell-based access control that started out as the NSA’s secure take on Bigtable.

While HBase had its moment in the sun, its market share now isn’t as large as most in the industry expected a few years ago. As Matt Asay explained earlier this year, “its narrow utility and inherent complexity have hobbled its popularity and allowed other databases to claim the big data crown.” And as Rick Grehan explained in depth in 2014, HBase has too many moving parts and is too hard to set up and tune for mere mortals.

While Cassandra is a bit more popular, has a SQL-like query language, and is easier to get up and running than HBase, it is still complicated and has a significant learning curve. Accumulo is more of a niche database, primarily seeing service for government applications.

