The promise of next-generation file I/O protocols

Driven by such issues as regulatory concerns, multimedia and e-mail spam, storage use is continually escalating even as its costs decline. Within this dynamic environment, cash-strapped end users struggle with serious issues such as disaster recovery and business continuance. Simultaneously, a flood of new user demands is being evaluated against a flow of new technologies. Thus, we live in interesting times.

Complexity is a major, ongoing problem. For instance, storage management has probably been written and talked about more than any other current IT problem. However, the problem isn't really one of management - it's an issue of complexity, and the current crop of management tools is being used to mask this inherent complexity.

Simply put, today's technologies have come a long way toward making storage better, faster and cheaper, but they have not done a good job of eliminating complexity. As a result, many products remain cost-prohibitive.

Evaluating I/O alternatives

Consider storage evolution: First, there was direct attached storage (DAS). Although arguably limiting (because systems could not share storage) and costly (because each server's storage had to be managed separately), it was conceptually elegant. Then came storage area networks (SANs), which enabled systems to share storage. Boundaries were removed and resources shared, but there was also an exponential increase in complexity. Hence, complexity in storage is usually referred to in the context of these block-oriented storage systems, which are primarily implemented in the form of Fibre Channel SANs.

An alternative storage architecture exists that is drastically less complex than block I/O SANs and is often overlooked in enterprise storage environments. Shared file-based storage, commonly referred to as NAS, is renowned for its simplicity and ease of use. The fundamental difference between NAS and SANs is that while SANs deal with block I/O, which is predicated upon generic blocks of data, NAS uses file I/O, which defines data in terms of complete containers of information (files).

While today the primary view of a file is as an icon on a computer screen accessed via a mouse click, it was not all that long ago that a file was thought of as a manila folder containing a cohesive set of information on some subject. Humans still tend to think in terms of complete containers of data, as opposed to elemental chunks of data.

Why is block I/O inherently more complex than file I/O? By way of analogy, imagine walking into an office building that is one big empty shell, with all the components for cubicles stacked to one side, including the partitions, connectors, desk units, lights, wiring, screws and bolts. Off to the other side are the tools required to assemble the cubicles.

A great many decisions need to be made regarding construction of those cubicles. This includes deciding how many are needed, how they will fit in the office building, the size of the cubicles, the layout, the height of the partitions, where wiring will be run, etc.

There are literally hundreds of decisions that need to be made, and it will take a great deal of work to assemble the cubicles, test all the wiring, lights and so on. This process is analogous to the implementation of block I/O in a SAN environment.

Now imagine walking into an office building with all the cubicles already assembled and wired, ready to be assigned and occupied. This is the world of file I/O. The layout and size of the partitions may not be ideal, but they are ready to use with a minimum of effort and planning.

It is far easier to evaluate a fully furnished office building and figure out how to occupy it than it is to start with raw components and construct the interior from scratch. The latter option would yield a greater degree of efficiency but is far more difficult to implement.

Tradeoffs

With this inherent simplicity, why isn't file I/O more universally deployed to solve the complexity problem in the enterprise? The short answer is, performance. File I/O traditionally has not had the same performance levels of bandwidth and latency as block I/O for enterprise applications. For example, high-end database applications typically utilize block I/O as opposed to file I/O to achieve the desired performance when grabbing small bits of data (a name, a phone number, a part number, etc.) for use in an application. It's a trade-off - users must suffer the cost of complexity in order to realize optimum performance for business applications.

There are currently a number of promising technologies in development that are designed to help address the complexity problem. Some are simply better tools that do things such as increase the amount of storage an administrator can manage with a given amount of work. These solutions are designed to enhance the efficiency of dealing with complexity, but they do not eliminate it. In fact, in the short term, they will likely increase complexity.

Next-generation file I/O and RDMA-based fabrics

Some technologies address complexity in a more head-on fashion. Instead of taking something that is inherently complex and trying to manage that complexity, they start with a technology that is inherently simpler and increase the performance to a level appropriate for enterprise applications.

Such is the strategy of two major initiatives in the storage industry today: Direct Access File System (DAFS) and Network File System version 4 (NFSv4.)

DAFS and NFSv4 are next-generation file I/O protocols that offer the possibility of erasing the complexity of shared enterprise storage pools while providing the performance required for competitive business applications.

RDMA: Improving performance

A fundamental element of both these initiatives is that they are based on a common underlying technology called Remote Direct Memory Access (RDMA). RDMA-based network fabrics allow computer systems to access each others' memory directly with greater efficiency, exceptionally low latency and without operating system intervention - elements that all improve performance.

RDMA fabrics are gaining popularity and can be implemented today via Fibre Channel, Ethernet and InfiniBand. Standards efforts such as Remote Direct Data Placement (RDDP) and the InfiniBand v1.1 specification will further the deployment of RDMA fabrics.

NFS is the premier industry standard for file sharing between computers. It has had significant revisions since its introduction almost 20 years ago. Recently, Sun (the original developer of NFS) and Network Appliance announced a collaborative effort to improve NFS yet again so it will effectively run over 10G bit/sec RDMA fabrics. This collaboration consolidates the companies' key initiatives aimed at delivering the benefits of next-generation file I/O protocols. Both companies will work together within the Internet Engineering Task Force (IETF) process to bring RDMA capabilities to current and future NFS versions.

DAFS: Reducing complexity and increasing performance

Even more promising may be DAFS, a protocol created by the DAFS Collaborative, a group of more than 85 companies. The DAFS specification v1.0, which is based partly on NFSv4, was completed in September 2001, and the DAFS API specification v1.0 was finished in November 2001. DAFS has been publicly demonstrated by a number of companies, including Network Appliance and Fujitsu.

DAFS eliminates the need for a local file system (reducing complexity and increasing performance), but more importantly, it eliminates the need for a network transport protocol to be executed on the main computing complex. The protocol most commonly used in NAS environments is TCP/IP, which is well known as a performance bottleneck. Its elimination from storage transactions will significantly improve both bandwidth and latency.

Combined with the RDMA capability inherent in DAFS, servers and clients are able to view each others' memory directly, allowing them to conduct very efficient I/O operations.

It is rare for a single technology to solve a difficult, varied and widespread problem such as storage complexity. No doubt, the storage industry will see the deployment of many new technologies in an attempt to resolve this complexity.

While groundbreaking in their scope, next-generation file I/O protocols represent new technology and, as such, will require some time to be deployed in the data center. In fact, deployment will take a number of years. Regardless, next-generation file I/O protocols will eventually have a dramatic impact on the data center in terms of reducing complexity and increasing performance.

Todd Matters is chief technology officer at InfiniCon Systems Inc.

Copyright © 2003 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon