The road to SQL started with Dr. E.F. Codd's paper, "A Relational Model of Data for Large Shared Data Banks", published in Communications of the ACM in June 1970. His colleagues at IBM, Donald Chamberlin and Raymond Boyce were working on a query language (originally named SQUARE, Specifying Queries As Relational Expressions) that culminated in the 1974 paper, "SEQUEL: A Structured English Query Language". Since that time, SQL has become the dominant language for relational database systems. In recent years, frameworks and architectures have arrived on the programming scene that attempt to hide (or completely remove) the use of SQL and relational databases allowing developers to focus even more on user interfaces, business logic and platform support in our application development. We have also seen the rise of alternatives to relational databases, the so-called "NoSQL" data stores. Are we witnessing the end of SQL and relational databases?
In a December DDJ podcast interview by Mike Riley, I was asked the question: "With the growing trend of ORMs (Object Relational Mapping), some software developers are saying that SQL is becoming less relevant. What are your opinions about that statement?" Thinking over the holidays about that question and the implications including and beyond ORMs, I spent some time thinking more about frameworks like Ruby on Rails Active Record and Hibernate. These frameworks still rely on team members who work on the design, development and maintenance of relational databases. The work that Microsoft has done with LINQ also reduces the impedence mismatch between programming languages and database languages.
The "NoSQL movement" and Cloud based data stores are striving to completely remove developers from a reliance on the SQL language and relational databases. Some developers think that the NoSQL movement is something completely new. Object databases appeared in the 1980s and Ray Ozzie started the commercial document-centric datastore business with Lotus Notes in the 1990's. Charlie Caro, Senior Software Engineer who works on Embarcadero's InterBase SQL database engine, told me "At the time, it seemed almost heretical that a data model that ignored concurrency control could flourish. But Ozzie recognized the benefits of distribution, replication and ease-of-setup would outweigh the very, infrequent occurrence of update conflicts in managing documents and messages. Even so, there were opt-in capabilities if documents needed to be locked for safe modification without data loss. But the default was to just let it fly with no update coordination."
NoSQL, according to WikiPedia, is "an umbrella term for a loosely defined class of non-relational data stores". The term was first used by Rackspace employee Eric Evans. In his blog post last October, Eric talks about the name NoSQL (now generally known to mean Not Only SQL). The real nugget in the post is "the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for." Adam Keys in his The Real Adam blog post offered up an alternative term, "Post-Relational". One goal of some NoSQL databases is to remove the overhead and memory footprint of relational databases. Other goals of NoSQL databases include: closer affinity with programming languages, use of web technologies and RPC calls for access and optional forms of data query.
In a recent blog post, "The "NoSQL" Discussion has Nothing to Do With SQL" Professor Michael Stonebraker discusses comparisons of SQL and NoSQL databases. SQL and NoSQL databases can be compared using any or all of the following features and characteristics. (Note: there are many more that can be added to this list for either or both types of databases. Post a comment with additional items that might characterize the differences between the two types):
- Vertically and Horizontally Scalable - relational databases (traditionally) reside on one server which can be scaled by adding more processors, memory and storage to provide scalability. Relational databases residing on multiple servers usually use replication to keep the databases synchronized. NoSQL databases can reside on a single server but more often are designed to work across a cloud of servers. (NoSQL: Distributed and Scalable Non-Relational Database Systems)
- Columns, Key/Value Store, Tuple Store - relational databases are usually comprised of a collection of columns in a table and/or view (fixed schema, join operations). NoSQL databases often store a combination of key and value pairs or Tuples (schema free, an ordered list of elements).
- In-Memory Dataset, On-Disk Storage - relational databases almost always reside on a disk drive or a storage area network. Sets of database rows are brought into memory as part of SQL select or stored procedure operations. Some (but not all) of the NoSQL databases are designed to exist in memory for speed and can be persisted to disk.
- Document-Oriented, Collection-Oriented, Column-Oriented, Object-Oriented, Set-Oriented, Row-Oriented - Document-oriented databases contain records of documents, fields and XML. Collection-oriented datasets provide closer affinity to object-oriented programming languages. Relational databases are characterized by data organized in tables, rows and columns (Column-Oriented). SQL select operations normally return cursors to a single row or a set of rows containing the columns specified. Object-Oriented databases have been around since the dawn of object-oriented programming but relational databases are by far the dominant database architecture in use today (and for years to come). Are object databases NoSQL databases? The rise of object-relational mapping (ORM) frameworks marries object-oriented programming to the larger number of relational databases. NoSQL databases data are often organized into objects, key/value pairs or tuples. NoSQL database election operations are often done in code or an interface.
In an email conversion, Charlie Caro told me the following: "If Facebook has to manage 100,000,000's of user profiles, a distributed, context-free key-value store is probably the way to go. Simple lookup by large numbers of users but only a single updater of the profile by the owning user means a transactional database is probably overkill. Multiple readers, with one updater, needs no concurrency control. In many cases, the simpler setup and ease-of-use of the NoSQL solutions is what attracts its user community. SQL databases require more setup (schema et al) but those schemas are exactly what give relational databases higher performance opportunity in parallel DBMS implementations. This ease of use premium finds its way in programming languages as well. Many developers today prefer scripting languages over their compiled counterparts, which have static type checking for safer programs. Scripting languages are just so forgiving and easy to get started with and projects to compile scripts into .NET/Java intermediate byte codes are increasing their performance." We both agreed that it's really all about having the right tool for the job, and it always has been! No sense hammering a screw into the wall when you can use a screwdriver.
Stay tuned for "The end of SQL databases - part 2" (of a 3 part series) where we will take a walk through some of the currently available open and closed source NoSQL databases. Then in "The end of SQL databases - part 3" I will point you to NoSQL Internet resources, past and upcoming events and offer some guidance.
For years to come, most of us will still rely on and use relational databases and SQL. I certainly will. I will also continue to look for better ways to isolate and encapsulate data access in my applications. As always, any project decisions have to match with the user and business requirements. For new projects, I believe, we have genuine non-relational alternatives on the table (pun intended). Are you using a non-relational database? Have you given up on SQL and relational databases? Are you moving your data to a public or private cloud? Post a comment.
The end of SQL and relational databases?
Programming is Life!
Recent news for developers:
- Software Engineering Institute Marks 25th Anniversary
- 2010 Grace Hopper Celebration of Women in Computing Opens Call for Participation
- Apache focuses on Java RIA effort
- Last of Python 2.x language line moving forward
- Mozilla dumps Firefox 3.7 from schedule, changes dev process
- Microsoft Visual Studio 2010 launch rescheduled for April 12
- Dynamic Language Support in the .NET 4.0 Framework
- The Fifth Underhanded C Contest is Now Open
David Intersimone (David I) is the Vice President of Developer Relations and Chief Evangelist for Embarcadero Technologies. My company blog is at http://blogs.embarcadero.com/davidi