NAS virtualization eases job of electronic discovery

Ibis Consulting Inc., a provider of electronic discovery and compliance solutions in Providence, R.I., considers itself a "data factory," wherein data is not only the raw material, but also a vital tool and a finished product.

Each month, the company collects and processes multiple terabytes of data for its clients, locating specific pieces of information for use in legal cases, regulatory compliance or administrative applications.

"Many times, we find the smoking gun in a corporate malfeasance case that implicates or exonerates a particular witness," says Cliff Dutton, executive vice president and chief technology officer at the firm.

The data that Ibis Consulting deals with is not exactly organized — it arrives on stacks of disks, CDs or DVDs, on hundreds of tapes, or electronically. It is in structured and unstructured formats, and it may be virus-infected from e-mail that wasn't properly administered.

All this data, in all its forms and formats, has to be sorted, organized and transformed into formats that can be loaded into a document management system that enables indexing, searching and other analysis that supports discovery. The most relevant information must then be culled.

"We're like a large data factory, where we take in raw material, process it and send refined data out," Dutton says. And it has to do so quickly. "Our clients don't have time on their side," he says. "They have deadlines and obligations to present information either to the government or to the courts."

Two pillars: Throughput and efficiency

Given the amount of data and the urgency of the deadlines, it's easy to imagine the importance of system throughput and administrative efficiency to Ibis' IT team.

"Anything that enhances our ability to process information more quickly is a good solution," Dutton declares. "Our applications that do the indexing and culling and conversion of data into a form reviewable by counsel have to run as efficiently as they can, but they can't be any more efficient than the underlying infrastructure they run on."

Ibis upgraded its storage infrastructure recently, installing a 200TB, ATA- and Fibre Channel-based, network-attached storage array from BlueArc Corp. to store data for its 250-server farm. But as scalable and fast as BlueArc's Titan array was, Ibis still experienced two difficulties. First, administration of the NAS array got labor-intensive when the data related to a particular case grew beyond the limits for which it was originally provisioned.

"You can't always tell how big a case will become," Dutton explains. "You could get one CD of data, and then the case explodes and you have to expand the project." If that happened, IT administrators had to manually reconfigure the data shares to span multiple physical shelves within the array, or they had to provision new storage and manually migrate data sets across the new systems.

Second, if all the data associated with a particular project was on one set of disks or one physical shelf within the array, there were times during the processing of the project that a large number of CPUs were addressing that single location at one time, causing a risk of I/O bottlenecks.

Because these two concerns — efficiency and throughput — were at the top of Dutton's priority list, he knew he had to take action to mitigate any problems in those areas.

A new road

Dutton chose a route that's getting increasing attention: NAS virtualization. Storage virtualization in the SAN world has been around for some time and is realizing new-found popularity, as heavy-hitters such as IBM, Hitachi, Sun and EMC release products that enable physically separate and heterogeneous storage arrays to appear as a single logical pool of storage resources. In this environment, data flows freely among the various tiers and types of storage — depending on business needs — without disrupting the operating environment.

But NAS virtualization is a newer technology that is only now gaining attention, particularly since EMC acquired NAS virtualization specialist Rainfinity this summer. Other NAS virtualization vendors include Acopia Networks, Neopath Networks and Nuview, Inc.

According to Brad O'Neill, senior analyst and consultant at the Taneja Group in Hopkinton, Mass., while SAN virtualization works at the block layer, NAS virtualization works at the file layer, enabling users to unify multiple file systems on multiple NAS machines. As a result, they appear as one machine from a presentation standpoint.

Adds Tony Asaro, senior analyst at Enterprise Strategies Group in Milford, Mass., "If you have dozens or hundreds of file systems, it gets hard to manage and even navigate through them. NAS virtualization creates a way to see them as a single shared drive so it's easier to manage, share, edit and move the data."

Virtualization pool

Ibis Consulting chose Acopia Networks' Adaptive Resources Switch (ARX), which aggregates heterogeneous file storage devices and provides one consolidated access point, or "global namespace," to the storage pool. Instead of addressing the BlueArc Titan array directly, administrators work through the ARX device, which sends commands to the array.

Now, "when we move the path names, we don't have to change the software to accommodate it — that's Acopia's job," Dutton says. "It says, 'You give me the path name, and I'll manage it to where it physically is behind the scenes.'"

One big benefit of ARX has been significant gains in administrative efficiency. Administrators no longer have to manually reconfigure disk shelves when a project grows. "We can automatically expand disk space virtually to enable it to be as large as it needs to be as the project grows," Dutton says.

Another benefit is throughput. The ARX switches spread the data across a large number of disk spindles, eliminating the risk of I/O bottlenecks. "It's load-balancing," Dutton comments. "Rather than filling up the storage system like a stack of chips on a poker table, it's like a swimming pool, where you're filling it evenly across the whole array."

A third benefit of the system is the ability to migrate data across storage tiers based on its age and other parameters using rules set up in Acopia's policy engine. Acopia doesn't care what kind of disk is behind it, so data can be moved from expensive Fibre Channel disks to less-expensive SATA disks when it gets older or less mission-critical.

"That takes a lot of administrative overhead off the IT staff," Dutton says. It also gives Ibis a choice of where it replicates data: Fibre Channel or SATA. Dutton is also considering virtual tape libraries for replication targets.

And because ARX is switch-based, the virtualization happens at a hardware level, which means there's no impact on performance as with software-based virtualization solutions. This was a significant differentiator for Dutton, as "one of the parameters we want to maximize is throughput," he says. "In our world, less throughput translates to opportunity cost."

This becomes important when taking advantage of ARX's real-time shadow-copying capability, which is writing data to two places instead of just one. Explains Dutton, "Only a hardware switch gives you that ability vs. software sitting in front of the NAS arrays because software takes cycle time to process the instructions, whereas a switch does it in real time so there's no degradation in throughput."

Future concerns

Dutton chose Acopia's ARX over other choices partly because of its switch-based architecture, but also because it has gained a foothold in environments similar to those of its clients.

"Many of our clients are large-scale enterprise clients, so we need to integrate with the strategies they're deploying," Dutton notes. "As we become more integrated in their business processes, there will be a regular flow of information that will be subject to discovery."

Keeping in mind the likelihood that the NAS virtualization market will become both more competitive and murky in terms of which product is best for different environments, Dutton offers this advice: "Be clear with yourself what variables you want to maximize, whether it's throughput, ease of administration or integration with your existing storage array. Share those goals with your potential vendors and hold them accountable to achieving those goals."

In the future, Dutton would like to see NAS virtualization vendors collaborate on interoperability more extensively, both with each other and with the vendors of the systems they virtualize. "For the full future value of virtualization technology to be achieved, heterogeneous systems must play well together," he says.

For instance, greater interoperability among virtualization vendors would increase the rate at which the benefits of virtualization can spread across components of enterprise IT. And greater interoperability with storage providers would give users flexibility in the systems they are able to deploy without unnecessarily increasing administrative overhead in terms of purchasing, training and supporting multiple vendors.

Mary Brandel is a freelance writer based in Newton, Mass. She can be reached at marybrandel@verizon.net.

Copyright © 2005 IDG Communications, Inc.

How to supercharge Slack with ‘action’ apps
  
Shop Tech Products at Amazon