Managing disk storage was once simple: If we needed more space, we got a bigger disk drive. But data storage needs grew, so we started adding multiple disk drives. Finding and managing these became harder and took more time, so we developed RAID, network-attached storage and storage-area networks. Still, managing and maintaining thousands of disk drives became an ever more onerous task.
The latest answer to this dilemma is storage virtualization, which adds a new layer of software and/or hardware between storage systems and servers, so that applications no longer need to know on which specific drives, partitions or storage subsystems their data resides. Administrators can identify, provision and manage distributed storage as if it were a single, consolidated resource. Availability also increases with storage virtualization, since applications aren't restricted to specific storage resources and are thus insulated from most interruptions.
Also, storage virtualization generally helps automate the expansion of storage capacity, reducing the need for manual provisioning. Storage resources can be updated on the fly without affecting application performance, thus reducing downtime. (Find QuickStudy primers on more than 100 technology and business topics.)
Because virtualization operates as an intermediate layer, it becomes the primary interface between servers and storage. Servers see the virtualization layer as a single storage device, while all the individual storage devices see the virtualization layer as their only server. This makes it easy to group storage systems -- even devices from different vendors -- into tiers of storage.
This layer shields servers and applications from changes to the storage environment, letting users easily hot-swap a disk or tape drive. Data-copying services are also managed at the virtualization layer. This means that data replication, whether for snapshot or disaster recovery, can be handled entirely by the virtualization system, often in the background, with a common management interface. Because data can be moved at will, lightly used or outdated data can be easily moved to slower, less-expensive storage devices.
Storage virtualization can be structured in three ways:
- Host-based. Here, physical drives are handled by a traditional device driver, while a software layer above the device driver intercepts I/O requests, looks up metadata and redirects I/O.
- Storage-device-based. In this type of setup, virtualization can be built into the storage fabric; for example, newer RAID controllers allow other storage devices to be attached downstream. A primary storage controller (usually a dedicated hardware appliance, though some systems now use switches) handles pooling and manages metadata, allowing the direct attachment of other storage controllers. Such systems may also provide replication and migration services across different controllers.
- Network-based. In this configuration, storage virtualization is viewed as a network-based device, generally using Fibre Channel networks connected as a SAN. Here, too, an appliance or switch-based implementation is most common.
Experienced users agree that all three approaches can work well. But although virtualization promotes cross-vendor storage utilization, most implementations lock you into a specific vendor.
Kay is a Computerworld contributing writer in Worcester, Mass. Contact him at firstname.lastname@example.org.
- To learn more about storage virtualization, first consult the entry in Wikipedia.
- A tutorial on storage virtualization is available at the Storage Networking Industry Association's Web site (download PDF).
Want More? For a complete archive of QuickStudies, go to computerworld.com/quickstudies
This version of the story originally appeared in Computerworld's print edition.
Got something to add? Let us know in the article comments.