Which freaking database should I use?
In the era of big data, good old RDBMS is no longer the right tool for many database jobs. Here's a quick guide to choosing among NoSQL alternatives
Infoworld - I've been in Chicago for the last few weeks setting up our first satellite office for my company. While Silicon Valley may be the home of big data vendors, Chicago is the home of the big data users and practitioners. So many people here "get it" that you could go to a packed meetup or big data event nearly every day of the week.
Big data events almost inevitably offer an introduction to NoSQL and why you can't just keep everything in an RDBMS anymore. Right off the bat, much of your audience is in unfamiliar territory. There are several types of NoSQL databases and rational reasons to use them in different situations for different datasets. It's much more complicated than tech industry marketing nonsense like "NoSQL = scale."
[ Andrew Oliver declares the time for NoSQL standards is now. | Also on InfoWorld: NoSQL standouts: New databases for new applications | Get a digest of the key stories each day in the InfoWorld Daily newsletter. ]
Part of the reason there are so many different types of NoSQL databases lies in the CAP theorem, aka Brewer's Theorem. The CAP theorem states you can provide only two out of the following three characteristics: consistency, availability, and partition tolerance. Different datasets and different runtime rules cause you to make different trade-offs. Different database technologies focus on different trade-offs. The complexity of the data and the scalability of the system also come into play.
Another reason for this divergence can be found in basic computer science or even more basic mathematics. Some datasets can be mapped easily to key-value pairs; in essence, flattening the data doesn't make it any less meaningful, and no reconstruction of its relationships is necessary. On the other hand, there are datasets where the relationship to other items of data is as important as the items of data themselves.
Relational databases are based on relational algebra, which is more or less an outgrowth of set theory. Relationships based on set theory are effective for many datasets, but where parent-child or distance of relationships are required, set theory isn't very effective. You may need graph theory to efficiently design a data solution. In other words, relational databases are overkill for data that can be effectively used as key-value pairs and underkill for data that needs more context. Overkill costs you scalability; underkill costs you performance.
Key-value pair databasesKey-value pair databases include the current 1.8 edition of Couchbase and Apache Cassandra. These are highly scalable, but offer no assistance to developers with complex datasets. If you essentially need a disk-backed, distributed hash table and can look everything up by identity, these will scale well and be lightning fast. However, if you find that you're looking up a key to get to another key to get to another key to get to your value, you probably have a more complicated case.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Aberdeen Group: Marketing Analytics for Manufacturing: Forging Customer Insights There are no recalls for poor marketing. Manufacturers need to get their customer intelligence and messaging right the first time. Learn how.
- SIEM: Keeping Pace with Big Security Data Learn how SIEM can have the right database back-end and offer security intelligence that leverages contextual data to achieve a strong security posture...
- Is Your Big Data Solution Production-Ready? Read "Is Your Big Data Solution Production-Ready?" now, and discover best practices and actionable steps to implementing a production-ready big data solution.
- Pay-as-you-Grow Data Protection: IBM Tivoli's Full-featured Data Protection Suite for Small to Medium Businesses IBM Tivoli Storage Manager Suite for Unified Recovery gives small and medium businesses the opportunity to start out with only the individual solutions...
- Webinar: Building a Big Data solution that's production-ready Big data solutions are no longer just a nice-to-have.
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well... All Databases White Papers | Webcasts