Normalizing the SAN

In today's complex storage area network (SAN) environment, one of the most confusing terms to many in this space is the oft-quoted but often misapplied term: virtualization. Complicating the matter is that it can be referred to as either SAN virtualization or storage virtualization. Then, as a final twist, vendors tell us this virtualization can reside at the host level, the network level or the storage array level. And, of course, they all say their method is the best method and will win out over all of the others.

So what is an overworked systems manager with limited knowledge on the subject matter to do with yet another six-syllable technology? Because ultimately, a wrong choice of this technology could ultimately cost you time and money and worse yet, in this economy, your job. But the right choice could ultimately save you time and money and may result in a promotion. Surprisingly, to arrive at the answer to the right choice of virtualization technology, one needs to use a basic relational database design technique, normalization.

Defining the terms

To begin with, let's define the terms introduced earlier and then arrive at the logical answer. First, let's analyze the two different virtualization phrases: SAN virtualization and storage virtualization.

Frankly, SAN virtualization is a meaningless term. To virtualize a SAN could translate into any number of concepts. Since a SAN encompasses the entire data path from the time the data leaves the server until the data returns from the storage, one could conceivably virtualize any part of the SAN, whether it is the server, the switch or the storage. So the phrase 'SAN Virtualization' is too vague of a term to be used in this environment.

Therefore for the purposes of this article and to keep the concept clear, only the term storage virtualization will be used. It truly has meaning and conveys what the purpose of virtualization in a SAN is, to present physical disk volumes in the storage array as a logical entity to the server.

Now for the definitions of the various and most common types of virtualization: host-based, network-based and array- based virtualization.

Host-based virtualization is accomplished by putting a software agent on each server on the SAN. This software will, in theory, manage and share the storage between the different servers on the SAN that has this software agent loaded on it. This approach is also often referred to in the press as out-of-band virtualization. Veritas Software Corp. is the vendor most often associated with this approach in the open systems space.

Network-based virtualization is accomplished by putting a device with software loaded on it that sits between the server and the storage. This approach is also referred to in the press as in-band virtualization. All data that travels on the SAN between the server and the storage must pass through this device. This device recognizes all of the storage and servers and presents the storage in logical volumes to the servers. Originally championed by a number of small companies, Datacore, FalconStor and StorageApps most notably, this approach has since been formally adopted by Hitachi LTD, IBM and Fujitsu Software Technologies (Fujitsu Softek) as their strategic direction.

The third and most common virtualization approach is array-based virtualization. With this approach, an external cabinet houses all of the disk drives and then uses its own internal software to virtualize the disk behind it. This approach was originally championed and popularized by EMC Corp. in the 1990s though other disk vendors, such as IBM and Hitachi Data Systems, now have many of the same features as EMC.

So the question is, of these three technologies, which is the right choice? (The right choice being defined as the one that is the most cost effective, will allow you to use the investments you already have in place, and still scale for future growth. In addition, it will ideally simplify the administration of the storage.)

A quick history of SANs

To answer that, one must understand a bit of the history of SANs in the Data Center environment. SANs today have their roots in the mainframe environment. In the mainframe environment, storage had become unwieldy to manage since it was internal to the system. To solve this problem, someone had the bright idea to attach the storage externally. Once attached externally it solved part of the problem, but not all of it. The storage while physically separate, was still logically associated with a single mainframe. Hence the development of ESCON directors. These are physical devices that sit in the data path that allow a storage array to be shared among multiple mainframes or a single mainframe to access multiple storage arrays.

The open systems model has followed much of the same path to this point. However, a perplexing question has arisen for many. Why did a model that works fine for the most part in the mainframe world break down so quickly in the open systems world? The explanation is best given using terms borrowed from the relational database realm.

In the mainframe world, the relationship between mainframes and storage arrays worked because a 'one-to-many' relationship exists. The 'one-to-many' term originates in relational database system design that expresses a relationship between one object and its many attributes, or more simply speaking, one operating system and the many operations it manages. In the mainframe world, the 'one' in the 'one-to-many' is the operating system, MVS. The 'many' in the 'one-to-many', at least in this example, is the many storage arrays, be they from EMC, HDS or IBM.

So, in the above example, even though many physical mainframes may exist, the 'one-to-many' relationship works reasonably well because there is a single logical OS on all of the mainframes managing the many storage arrays. This single OS is intelligent enough to manage the storage no matter where it resides in the mainframe SAN world.

The management nightmare

Now enter the world of today's Open Systems SANs. Multiple operating systems (Sun, AIX, Novell, Windows NT/2000, Linux (various flavors) must connect to multiple providers of storage (EMC, Compaq, Dell, Sun, Hitachi, IBM, Xiotech, StorageTek, Fujitsu). Toss in multiple providers of the directors in the middle (Brocade, McData, Inrange, Vixel) and then further exaggerate the complexity with different hardware vendors on which these operating systems run. All in all, it translates into a management nightmare, a very real one for most organizations.

Now let's translate this scenario into database terms. This nightmare reflects a 'many-to-many' relationship. Now what is that? This is a situation where many operating systems have many types of storage. From the view of a relational database administrator, this is an unmanageable scenario. However, there is a relational database technique that will make this scenario manageable called normalization.

Normalization in the relational database environment requires the creation of another table so as to create the 'one-to-many' relationship described earlier. This new table converts the previously unmanageable data into a manageable format.

The network-based virtualization

This same technique must be applied to the SAN to make this environment manageable. By applying this technique, the result is not a new table in the SAN but a new network layer in the SAN. This new layer then converts the SAN from its present unmanageable state to a very manageable environment. This new network layer is the network-based virtualization model.

Let's apply this to the open systems environment already described. You already have the many servers with Sun, AIX, NT/2000, Linux, Novell and Apple in existence. Now introduce a new device in the data path at this new network layer so that all data traffic passes through it. Configure the device so it sees all of the servers and so all of the servers can discover it. You now have introduced a 'one-to-many' relationship on one half of the SAN.

On the storage end, you connect this new device to the many types of storage you may have, whether they are from EMC, Hitachi, IBM, Dell, or Compaq. Again, configure the device so it sees all of the storage and all of the storage can see it. Now the 'one-to-many' relationship is introduced on the other half of the SAN.

With the introduction of these two new 'one-to-many' relationships, you have now transformed the SAN into the manageable design described above, simplifying it in the process. So using this proven relational database technique, one should be able to see why the network-based virtualization strategy is the only logical and sensible choice to make SANs manageable. It also helps to explain why major vendors are adopting this method as their long-term strategy.

For those vendors who choose to try to support the other two main virtualization models, either the host-based or array-based in the Open Systems environment, they will have only limited success. Their solutions will be highly proprietary, expensive, be difficult to administer despite the cost and will not easily scale in an enterprise environment.

Now these other models might work. But in order for them to work, the control for purchasing and managing the components of the SAN will need to be held by a very few individuals. This will likely create a bottleneck in the organization. But this scenario does not accurately reflect most organizations and most organizations do not desire this much control in the hands of so few.

Hence, the network-based virtualization model is the only logical choice. While this model may be disparaged at times in the popular press, this will be the model that emerges if, for no other reason, than it has to. In fact, major enterprise providers like IBM, Hitachi LTD, Fujitsu Softek and Veritas appear to have already come to this same conclusion.

Hopefully one now more clearly understands the term virtualization and can see from the arguments presented that of the three presented, the network-based virtualization model is the only one that makes sense long term. It should, in theory, ease the management burden of the SAN while also opening up whole new ways of thinking about storage and storage networking in the open systems arena.

But perhaps more important to systems managers, it should allow them to use what they already have more effectively. It will also save money in the long term without spending a great deal of new money in the short term to accomplish this. It is the best long term strategic decision for it is an objective that you may start to achieve now by spending less while purchasing the products you need more wisely. And, as a side benefit after all is said and done, you might just get to keep your job.

Jerome M. Wendt is a senior SAN analyst with First Data Resources.

Copyright © 2002 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon