Which freaking database should I use?
In the era of big data, good old RDBMS is no longer the right tool for many database jobs. Here's a quick guide to choosing among NoSQL alternatives
Infoworld - I've been in Chicago for the last few weeks setting up our first satellite office for my company. While Silicon Valley may be the home of big data vendors, Chicago is the home of the big data users and practitioners. So many people here "get it" that you could go to a packed meetup or big data event nearly every day of the week.
Big data events almost inevitably offer an introduction to NoSQL and why you can't just keep everything in an RDBMS anymore. Right off the bat, much of your audience is in unfamiliar territory. There are several types of NoSQL databases and rational reasons to use them in different situations for different datasets. It's much more complicated than tech industry marketing nonsense like "NoSQL = scale."
[ Andrew Oliver declares the time for NoSQL standards is now. | Also on InfoWorld: NoSQL standouts: New databases for new applications | Get a digest of the key stories each day in the InfoWorld Daily newsletter. ]
Part of the reason there are so many different types of NoSQL databases lies in the CAP theorem, aka Brewer's Theorem. The CAP theorem states you can provide only two out of the following three characteristics: consistency, availability, and partition tolerance. Different datasets and different runtime rules cause you to make different trade-offs. Different database technologies focus on different trade-offs. The complexity of the data and the scalability of the system also come into play.
Another reason for this divergence can be found in basic computer science or even more basic mathematics. Some datasets can be mapped easily to key-value pairs; in essence, flattening the data doesn't make it any less meaningful, and no reconstruction of its relationships is necessary. On the other hand, there are datasets where the relationship to other items of data is as important as the items of data themselves.
Relational databases are based on relational algebra, which is more or less an outgrowth of set theory. Relationships based on set theory are effective for many datasets, but where parent-child or distance of relationships are required, set theory isn't very effective. You may need graph theory to efficiently design a data solution. In other words, relational databases are overkill for data that can be effectively used as key-value pairs and underkill for data that needs more context. Overkill costs you scalability; underkill costs you performance.
Key-value pair databasesKey-value pair databases include the current 1.8 edition of Couchbase and Apache Cassandra. These are highly scalable, but offer no assistance to developers with complex datasets. If you essentially need a disk-backed, distributed hash table and can look everything up by identity, these will scale well and be lightning fast. However, if you find that you're looking up a key to get to another key to get to another key to get to your value, you probably have a more complicated case.
- 10 Hot Big Data Startups to Watch
- 11 Unique Uses for Google Glass, Demonstrated by Celebs
- How to Export Your Google Reader Account
- How to Better Engage Millennials (and Why They Aren't Really so Different)
- Telltale signs of ATM skimming
- 20 security and privacy apps for Androids and iPhones
- Big screen con artists: 7 great movies about social engineering
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- Cloud Analytics for the Masses Learn the best practices in building applications that can leverage volume, variety and velocity of Big Data for organizations of any size.
- Big Data, Big Demands While the concept of big data is nothing new, the tools and technology are in place for companies to take full advantage. Enterprises...
- Big Data Transforms Business eBook Businesses that exploit Big Data to improve their strategy and execution can distance themselves from competitors by using new insights from data that...
- Harness IT -- An Introduction to Business Intelligence Solutions Learn the key selection criteria required to provide your organization with the capability to address structured data, unstructured data and mobile demands so...
- Content Analytics: Big Data Conquered, Customer Service Elevated For organizations looking to start a content analytics program or improve their existing capabilities, Aberdeen Group and IBM will lay out several recommendations...
- 3 Reasons Why Sepaton is the World's Fastest Backup Solution Leading analyst, Storage Switzerland learns how Sepaton backs up and deduplicates massive data volumes while maintaining the industry's fastest performance - all in... All Big Data White Papers | Webcasts
Get started with this popular programming language for data visualization and analysis. Read more....