The next big thing: SAN policy-based service-level management

At Hudson's Bay Company in Ontario, storage provisioning is often a time-consuming series of handoffs between users and technical support. That's why Laurence Whittaker, supervisor for Enterprise Storage Management Support Services, is looking to implement BMC's Patrol Storage Automation Provisioning software.

Announced at the fall SNW conference, the product works with BMC's Patrol Storage Manager to take care of the technically complex and error-prone task of storage provisioning. This involves locating available capacity and mapping out a logical path across the SAN between the server and storage subsystem.

Once the BMC product is implemented, database administrators and other users with limited storage expertise will be able to provision their own storage at the click of a mouse. In addition to conserving the valuable time of storage technicians, BMC's offering is expected to cut lead-time for storage provisioning from days to minutes, Whittaker notes.

Automated storage provisioning came of age this year, with product announcements from market leaders such as BMC and Veritas, as well as smaller vendors such as InterSAN, AppIQ, StoreAge Networking Technologies, Provisionsoft and Fujitsu's Softech division.

The next big thing

More importantly, vendors are using the technical underpinnings of these platforms as the basis for what many are calling the next big thing in storage management: policy-based service-level management for the SAN.

As applications share and compete for resources across the SAN, IT managers will increasingly turn to policy-based service-level management to ensure that high-priority applications get the performance and availability to which their SLAs entitle them, says John Webster, senior analyst and founder of Data Mobility Group.

Just down the road, vendors promise, SAN installations can take advantage of true, mainframe-style "lights-out management," in which automated software not only identifies problems but determines and initiates the best course of action, with no human intervention.

The storage provisioning and SL management process works as follows:

  • An auto-discovery tool goes out and finds all the physical devices on the SAN, as well as the logical layout of LUNs, volumes and file systems. In automated storage provisioning, the autodiscovery database is used to locate available storage and determine the optimal path between application and storage subsystem.
  • An SL management platform might use autodiscovery to correlate an application's logical path with underlying physical devices, in order to determine the likely source of a response time slowdown or database access failure.
  • An automation engine initiates action through pre-written scripts. A policy engine determines how the automation engine responds to a particular event. EMC's recently announced Automated Resource Manager (ARM) and Veritas' SANpoint Control 3.5 enable administrators to set up policies that define service levels for a given application, including performance and availability levels, backup and mirroring, and geographic location. When an application requires more capacity, the software automatically goes out and finds storage that matches those criteria.
  • SL management platforms connect the above components to real-time SAN management tools that generate alerts to which the automation engine responds, including an over-utilized server CPU, a disk or application running out of capacity or a downed SAN port.
  • On the back end of SL management platforms are SRM tools, SAN configuration tools and other management applications that the automation engine invokes in order to deal with a request or problem.

The process in action

A typical scenario might go as follows:

  1. The application performance monitoring tool indicates that the Oracle database application is degrading.
  2. The autodiscovery database shows the server on which the Oracle database resides, the logical path between application and storage subsystem, and the physical devices, ports and storage volumes involved.
  3. Using expertise embedded in a knowledge base and performance trending information compiled by SAN monitoring tools, the system determines the likely location of the bottleneck and the appropriate response.
  4. The system then either notifies a human operator or goes ahead and takes action. Responses can range from telling the SRM tool to provision more storage capacity to directing a network load balancer to move an application onto a less busy server.

'A long-term imperative'

First Data Corp. is one enterprise that is keeping a close watch on the evolution of SAN SL management offerings.

"A common, policy-based service level management interface is viewed as a long-term imperative across our multivendor storage networks," says Jerome Wendt, a senior information systems analyst at First Data Corp.

Before it meets this imperative, however, the financial services firm must deploy a robust virtualization engine and a storage infrastructure within which to manage and report on a wide range of multivendor storage, network and host devices. Furthermore, vendor products are still maturing, Wendt adds.

Major storage management platform vendors like BMC, EMC, Veritas, Hewlett-Packard and IBM/Tivoli, already offer a raft of products to manage SAN, storage and (in the case of companies like BMC, HP and Tivoli) system and IP network devices. These vendors are in the process of integrating their disparate management modules, along with policy and automation engines, into a seamless SL management platform.

BMC, for example, is working with Invio -- whose automation engine it is reselling -- to integrate Patrol Storage Automated Provisioning with Patrol storage and server monitoring and service level management tools. This will allow users to define application service level objectives as thresholds on SAN devices, then have Patrol monitor the thresholds and send off alerts when one is breached.

EMC's Control Center 5.0, announced about a year ago, integrates storage resource management with performance monitoring down to the I/O and database level, as well as utilization monitoring at the user and group levels. Under development is the ability to monitor application performance across the SAN infrastructure.

Meanwhile, several small start-ups have already introduced SAN SL management platforms. These include InterSAN's Pathline, Provisionsoft's DynamicIT and AppIQ's AppIQ Solution Suite.

Most SL management platforms can monitor thresholds on the full range of SAN devices, including servers, HBAs, SAN switches and storage subsystems. Right now, the main automated response most platforms can make is storage provisioning; however, vendors are working hard to expand their platforms' repertoire.

Disparate vendor approaches

Provisionsoft's DynamicIT, for example, monitors server CPU utilization and works with network load balancers to ensure that application service levels are met.

In October, StoreAge Networking Technologies introduced SVM Policy Manager for Microsoft Exchange, which uses a rules-based policy engine to automate storage management tasks such as storage provisioning, data recovery and online rollbacks and backup. Support of Oracle and Lotus Notes is promised by year's end.

Enterprise management vendors like HP, BMC and Tivoli are working toward end-to-end SL management platforms that can identify and automatically respond to server, network or storage problems.

Several vendor spokespeople say they are implementing the Storage Networking Industry Association's Bluefin/SMI standard, hoping thereby to speed up the process of extending management support across hosts, HBAs and storage subsystems. Bluefin's common management definitions also provide a means for different tools within an SL management platform to exchange and correlate information.

First Data is counting on broad vendor support of CIM as a means of providing common performance information -- either automatically or manually -- across a highly heterogeneous SAN installation, Wendt notes.

The critical knowledge base

Perhaps the trickiest and most crucial element of an SL management platform is the knowledge base that uses rules and models to diagnose the likely cause of a problem and determine the most effective response. When managing a virtualized, enterprise SAN environment, making a faulty diagnosis is all too common and potentially lethal, given that "everything is connected and impacts everything else," explains Michael Karp, a senior analyst at Enterprise Management Associates.

Plus, Karp adds, "The rules can't be too generalized; you need some generic rules defined by vendors, plus local ones defined by users. Furthermore, once you've decided how to react to a situation, you need to figure out how the action will affect everything else, meaning, for example, how an unscheduled server backup will affect response time for other applications."

According to Bob Rogers, chief storage technologist at BMC, "A large number of customers are reluctant to use automation today because of the very real possibility that the automated response will make the problem worse instead of better, either because the system doesn't have enough information, or the rules are too limited in scope."

Notes Barry Ader, manager of the open software marketing program at EMC, "Users love automation and see the need for it, but they want to take baby steps towards trusting the software and trusting themselves to set up proper rules and thresholds."

The path to 'true automation'

Vendors such as EMC and Veritas are trying to lead their customers toward true automation step by step: Their management platforms can perform automated provisioning with no human intervention, or they can use wizards to lead the user step by step through the provisioning process, asking, "Is this OK?" along the way.

It still may be some time before businesses adopt policy-based service level management wholeheartedly. Whittaker says that many colleagues he has spoken with are still implementing a virtualized enterprisewide SAN environment that can support such capabilities.

Hudson's Bay, while primarily an IBM Enterprise Storage Server and McData shop, still has a fair number of direct-attached storage devices, Whittaker says. Companies also need to have in place the logical volume management and data migration tools that can scale storage up or back, and move data "dynamically and transparently," in response to alerts, he says.

Hudson's Bay hopes to start implementing automated, policy-based SL management this year.

Elisabeth Horwitt is a freelance writer in Waban, Mass.

Related:

Copyright © 2003 IDG Communications, Inc.

  
Shop Tech Products at Amazon