SAN on the move

There's no doubt SANs can get complicated quickly

Because of the large-scale growth in data, many of today's storage-area networks (SAN) have evolved considerably from their early, well-contained deployments. Corporate IT executives are pushing SANs to their limits -- in size, complexity and functionality -- as they embrace New Data Center mandates about tiered storage, storage on demand and delivering storage as a service to key internal customers.

The amount of data in today's enterprise SAN can be measured in multiple terabytes and may comprise thousands of complex network interconnections. Forward-looking IT organizations have looked for new and creative ways of managing the scale and complexity that go hand in hand with sprawling, networked storage environments. Increasingly, they are employing SAN change-management tools -- often a subset of a larger storage resource management (SRM) arsenal -- to meet the New Data Center's stringent performance demands.

To varying degrees, IT executives are using today's crop of SAN change-management tools to navigate and keep tabs on the labyrinth of dual-redundant paths that exist on a SAN from host to storage array. The tools typically let administrators follow the trail of these interdependencies, which goes everywhere, from hosts and host bus adapters (HBA) to Fibre Channel switches, ports and storage arrays -- even down to the level of logical unit numbers (LUN) and virtual volumes carved out of individual disk drives.

Many SAN change-management tools offer detailed device discovery and topology mapping for homogeneous and multivendor SANs. Some tools also offer real-time system monitoring, troubleshooting and change-violation alerts, functions that directly link to a SAN change-history database they maintain.

Some tools make an effort to follow IT Infrastructure Library (ITIL) best practices for maintaining and updating changes, says John Webster, analyst and partner with the Data Mobility Group. Storage vendors with a strong ITIL focus include IBM and Hitachi Data Systems, he says.

Other tools go a step further and provide predictive change-management functionality. With these tools, storage managers can perform sophisticated what-if analyses and modeling, from which they learn the impact of potential SAN changes before rolling them into production.

Using tools with change-tracking and -monitoring abilities has begun to prove critical for a variety of high-growth SAN environments, including organizations moving to embrace service-centric delivery.

Babu Kudaravalli, director of enterprise systems at Port Washington, N.Y.-based National Medical Health Card Systems (NMHC), provides some healthy perspective on managing the new realities -- and risks -- of today's complex SAN deployments. His midsize company, which manages pharmacy benefits and processes drug cards, has grown considerably during the past five years.

NMHC's storage network has grown five-fold since Kudaravalli joined the company four years ago. Now it houses about 65T to 70TB of data and consists of an HP StorageWorks XP1024 disk array, an XP128 array at a remote site, a few StorageWorks EVA2000 systems and nearly 400 Fibre Channel ports on Cisco MDS switches. It includes NMHC's primary data center in Port Washington and its disaster-recovery site in a neighboring city.

NMHC acquired the StorageWorks arrays under a pay-per-use model, Kudaravalli says. Given this, he says he was particularly interested in tracking system capacity. He wanted to know how much storage each application server used, and how much free capacity was available across the SAN.

Previously, this meant developing and running custom scripts, Kudaravalli says. Now, using HP Storage Essentials (formerly from AppIQ), he can extract the information with just one press of a button. Also gone is the tedious work of maintaining the Visio diagram that depicted relationships between the SAN's components. Instead, Storage Essentials creates and updates a network map on one screen that shows how the StorageWorks EVA box is connected to the switch and all the hosts, he says.

Keeping tabs on this complex SAN environment and trends in application-specific usage is much different from the days of servers with direct-attached storage, Kudaravalli says. "Most people still see storage as a disk drive attached to a server box. But it's not when you are talking about hundreds of hosts and terabytes of storage all needing to be aligned and planned in order to prevent performance issues," he adds.

A company needs to be careful even when determining which ports to assign to which application hosts, he says. "You can't connect all your lights to the same power source, or you'll blow a fuse. The same thing applies to managing and redistributing the load of applications correctly among the various Fibre Channel ports in use on your SAN. Storage Essentials helps us identify which ports make sense to use, based on their load."

The SAN management dilemma

There's no doubt SANs can get complicated quickly, Data Mobility's Webster agrees. "With a starting SAN configuration, companies find themselves almost immediately adding more to it -- on an HBA, switch and port level. It just starts to grow," Webster says. "As organizations scale upwards, they want to know, 'Are we creating potential performance problems?' or 'Are we adding exposure to outages as we scale the system?'"

Even the smallest misstep while provisioning new storage in a SAN environment, running SAN cable, allocating ports, or commissioning and decommissioning servers can have a significant ripple effect that may lead to production application slowdowns or even downtime.

In a recent survey of its customers -- representing hundreds of companies that collectively support more than a million e-mail users -- business continuity services vendor MessageOne found that 16% of all e-mail outages were caused by SAN failures. The company, which published the survey results in a report on why e-mail fails, also noted SAN-related failures typically knock e-mail out of service for an average of 25.5 hours. Survey respondents attributed SAN-related outages to factors such as incorrectly configured LUNs, out-of-date drivers and administration of physical hardware by teams outside the messaging group.

"Eighty percent of problems in the SAN are a result of someone making a change to the system and doing something wrong," says Bryan Semple, vice president of marketing at Onaro, which makes SANscreen storage services management software. He recounts cases where SAN administrators made zoning changes that had undesired ripple effects, and relates the experiences of one customer, a lone SAN administrator at a healthcare organization, who has been using SANscreen to keep track of the moves made by 10 Windows administrators. This company's IT staff is prone to shutting down servers or pulling out HBAs with unexpected consequences for the SAN. In the latter case, Semple says, the SAN administrator views the use of SAN change-management software like SANscreen as something of a leveling product that gives him the chance to keep up with the SAN-related changes made by other teams.

Changing the process

Also no stranger to tracking changes on the SAN is Jake Roersma, manager of network engineering at Priority Health, an HMO in Grand Rapids, Mich. Roersma oversees a 125TB SAN with a three-tier storage architecture. The SAN consists of a high-performance HP XP12000 array at Tier 1, an HP EVA5000 system at Tier 2, and an HP MSA1500 at Tier 3, which is mostly reserved for backup and archiving. Priority Health's SAN also has approximately 512 ports on a mix of Cisco MDS 9509 and 9216i switches.

Like Kudaravalli, Roersma saw his SAN's size more than double in the last year and a half. That growth made it necessary to hire a second SAN administrator, adding to management complexity. "Once you get more than a single SAN administrator working, you could find one administrator changing one piece and another SAN administrator changing another piece. In certain circumstances, those changes could conflict. Monitoring that manually would take hours and hours to track down what might have changed," he says.

On occasion, Roersma says, Windows and Unix engineers would question his team about whether performance issues they had been experiencing could relate to SAN changes. Indeed, LUN-masking or port-level, hard zoning changes often were the culprits, because they typically dealt with which hosts could access which LUNs on the system, he says. Problems with Windows clusters combined with Linux or Unix hosts also surfaced, as well as issues surrounding breaks in the dual-attached multipathing offered as a high-availability service to applications on the SAN.

Manually tracking and troubleshooting all the elements in the SAN configuration and subsequent changes to them were just not feasible. That's why Roersma became interested in Onaro's SANscreen tool, after he saw a demonstration of how it worked at a storage industry conference.

"One of Onaro's strong points is the dual-path issue. All but one of the hosts we have are dual-attached. SANscreen shows us all the way through to the storage unit whether or not we have a redundant path," he says. "It will tell us if a host doesn't have an HBA, or a switch is connected twice. It also will tell us, at the storage level, if the storage has 10 ports connected and zoned to see the host, but the LUNs we've allocated aren't allocated on more than one port."

Today Roersma and his team frequently use SANscreen's verification tool to model SAN changes, such as allocating storage or modifying zones, and flag any possible problems. "It's evident as we grow that it's improved our process. In fact, a lot of our process is now designed around how the Onaro tool functions. We wanted a tool that monitors the changes. But the verification tool is actually now built into our change process. So we implement that prior to our allocating storage or modifying zones," Roersma says.

SANscreen also comes in handy for compliance, Roersma adds. "Auditors will say, 'We see in the change control tool that a request for a change of X amount of storage was made on this host. Can you show us that you made this change and how it was made?'" Roersma says. "My people will then go through the Onaro tool and basically give them screenshots that show that eight LUNs of X amount in size were added to the storage unit, that the LUN-masking to allow X host was made to those LUNs, and they were added to such and such a zone."

Finding the right tool

Whatever flavor they are, in the end, SAN change-management tools tend to serve as something of an auditing tool that helps people identify what they have, how quickly things are changing and how fast they are growing, says Shawn Wagner, a storage specialist at reseller CDW.

Stephen Foskett, director of strategy services at GlassHouse Technologies, agrees. "SAN change-management products step beyond just logging actions to correlating actions and their effects across the storage infrastructure," he says.

"Most IT systems have what's called syslog support to log system events. Most storage systems do, too," Foskett says. "One of the things SAN change management products offer is stepping beyond just logging actions to actually correlating actions and their effects across the entire storage infrastructure. If someone makes a zoning change on a switch or a LUN-masking change, the system will record that somebody made that change, but also the whole effect of it across the SAN."

Foskett visits a lot of Fortune 500 customer sites, including one he went to recently housing nearly 100 SAN switches and hundreds of storage arrays. Change-management tools are essential for keeping a site of that size running smoothly, he says.

He also sees the use of these tools as important for many customers who are now trying to move their IT organization into more of an internal service provider framework, sometimes even to the point of offering an assembly line model with different preset classes or tiers of storage to address the needs of various lines of business. "Larger companies are starting to see that the only way they can manage their environments is if things are increasingly standardized in process and configuration," he says.

Because the price puts some of these tools in the BMW rather than the Volkswagen category, Foskett says IT organizations must carefully weigh the benefits against the cost. "Most tools are priced based on the size of the environment, but I've seen the tabs for these solutions in the $50,000-plus range," he says.

Applied Quantitative Research in Greenwich, Conn., is one company that decided it couldn't justify the investment. Syslog monitoring, combined with Cisco MDS switch-related device and fabric management tools, is sufficient for the 30TB SAN at the primary data center, says Ismail Coskun, the investment and asset-management firm's systems development manager. The SAN uses an EMC Clariion at its production location with 48 Fibre Channel ports on an MDS switch, and replicates data to another Clariion in place at the company's remote customer site.

Cisco's Device Manager tool works well to identify where the company's more than 20 servers are connected to the SAN, Coskun says. "Device Manager offers a nice graphical view of what the switch looks like -- up- and downstream and its data-throughput speed," he says.

When evaluating SAN change management tools, GlassHouse's Foskett recommends looking for packages that support all the equipment in place on the SAN. He also values tools that can take a snapshot of the SAN configuration before and after a change is made, and offer a configuration file should the user need to revert to the prechange configuration. Some tools even handle the rollback automatically.

Lastly, Foskett stresses looking for tools that offer a wide range of flexibility in terms of what they can do. The best tools, for example, can aid in the process of adding new devices or even merging SANs together. "After all," he says, "those are the things you are going to be doing in the next couple of years."

Hope is a freelance writer who covers IT issues surrounding enterprise storage, networking and security. She can be reached at mhope@thestoragewriter.com.

This story, "SAN on the move" was originally published by Network World.

Copyright © 2006 IDG Communications, Inc.

  
Shop Tech Products at Amazon