At times, I get nostalgic for the good old days - a simpler time when the Olympics were in Beijing, WALL-E was the number one movie at the box office, the Phillies won the World Series, and the network storage world could be neatly divided into file and block-based approaches.
Yes, back in the good old days before "the cloud" had burst on the screen, if you were dealing with structured data (databases and the like), you used a block-based, SAN approach. If you were dealing with unstructured data (large files and folders, videos, medical images, etc.), you went with file-based, NAS approaches.
Then, things began to change. People began to see the advantages of having all that unstructured data stored and available in "the cloud." They also began to think about new, massive workloads (e.g. big data analytics), which could only be handled by distributing compute and storage across a huge number of devices. Furthermore, they became enamored of the possibility of storing data on almost any device located almost anywhere, connected via the Internet.
Of course, once the network was "the Internet," it became apparent that approaches that worked well over fast local networks attached to fast, monolithic storage didn't necessarily work well when the network was slow and jittery and the storage devices were far-flung and slow.
Enter object storage. The object-based approach differs from block-based approaches by organizing data into flexible-sized data containers, called objects. Each object has both data (an uninterpreted sequence of bytes) and metadata (an extensible set of attributes describing the object). It differs from file-based approaches in that, rather than relying on a hierarchical, tree-based file structure, object storage systems leverage a single flat address space. Rather than relying on NFS, CIFS, or other chatty protocols that don't scale particularly well across the Internet, objects are accessed, created, deleted, and moved using simple GET-PUT commands.
With the promise of simplicity, efficiency, economics, and scale, it's no surprise that object-based approaches have begun to dominate cloud storage.
But, there are some lingering problems with object-based approaches.
Perhaps the most important problem is that virtually all of the applications created in enterprises have assumed the existence of POSIX-compliant file systems. So, moving applications from on-premise to cloud-based architectures means that all of those legacy applications have to be rewritten for a new storage paradigm. The cloud has not embraced POSIX compliance and has used object storage because the thin protocol associated with objects makes the creation, movement and deletion very fast compared to the chatty protocols I mentioned previously. This defeats the purpose of the cloud being a flexible extension to existing infrastructures.
Even if you are writing applications from scratch, object-based approaches pose disadvantages. For one thing, many impose both a minimum and maximum object size. If your application deals in small files, you either are highly inefficient or you place the burden on applications to package up small files into larger objects. Furthermore, object-based systems often rely on a centralized metadata store, creating both a performance bottleneck and a single point of failure. Finally, object-based systems limit the number of objects to what can be stored in a single system's memory.
Map-Reduce, a software framework that supports distributed computing on large data sets on clusters of computers, is another example of where innovative computing has been successful; however, performance is still challenged by the implementation of a file system which relies on the out dated metadata server model.
Clearly, what is needed is an approach that combines the best of both worlds. We need a unified file and object storage approach that brings the speed and simplicity of object storage with the richness, flexibility, and compatibility of traditional file systems. Such an approach should enable files to be treated as objects, objects to be treated as files, folders to be treated as buckets, and buckets to be treated as folders. In other words, storage should be access agnostic.
As should be clear from my previous posts, I think unifying file and object also needs to be driven from a software point of view. While there are unified approaches out there that are hardware-based, this seems to me to be fundamentally at odds with the notion of the cloud.
The "bolt on" approach (attaching storage to the component assemblies and then carving out specific disk drives and capacities in order to dedicate storage to each component) may technically enable NAS, SAN and object capabilities, but it comes with added costs and risks. Users are purchasing much more storage capacity than they need separately for either file or object storage. With the inherent risks associated with common power, cooling, and cabling failures, users are exposed to the whole combinational assembly failing with complete loss of access to their data. And, at the end of the day, you aren't getting the scaling and flexibility that was the whole rationale behind a cloud-based approach.
By contrast, in unifying things at the software layer, you are enabling people to be both access and hardware agnostic.
Thus, organizations can:
- Build their own cloud services within their private cloud, or data center.
- Transfer objects between the private and public cloud and can even create an object storage environment on non-object based storage in the cloud. Move legacy file-based applications to the cloud.
- Build new cloud-based applications that can leverage both object and file storage.
It's true: I believe in UFOs. And, I believe the time is now.
Ben Golub is President and CEO at Gluster. He is on Twitter @golubbe.