Linux clusters move toward the mainstream

In mid-2004, Niall O'Driscoll faced an all too common problem: He knew his storage needs were growing quickly, but he didn't know much more than that.

O'Driscoll is vice president of engineering for Alexa Internet, Inc., a subsidiary of Amazon.com that collects 1.6TB of Web data and traffic information each day and sells detailed analyses of that information to developers and other users. He needed to make that data easier to access for his customers, but he had no idea how much any customer would need, when they would need it, the size of the files they would create or if they would access those files in sequential order or randomly.

His answer: Deploy a clustered file system from Ibrix Inc. It places no limits on the sizes of files or partitions written to it, handles Fibre Channel or SATA drives (or a combination of both) and distributes metadata (information about what data is stored on which arrays) across the storage nodes, making the system highly scalable and eliminating a single point of failure.

"I don't know of many businesses that can say, 'This is what my storage requirements are going to be for the next five years,'" says O'Driscoll. However, clusters can make it easier to cope with unexpected requirements while also reducing dependence on a single storage server that can fail or become too expensive to maintain.

While Linux storage clusters are still mostly used to provide data to — or store data generated by — scientific and technical applications, some customers and analysts see them moving toward more use by mainstream customers such as O'Driscoll. They come in a variety of forms, from pure software, to plug-and-play hardware appliances.

In any case, storage managers and analysts suggest getting expert help and doing a test drive to see if Linux clusters are the right fit for a given environment. Another important issue is whether, and how well, the databases they use support clustering.

Cluster benefits

Clustered storage (a form of grid storage) refers to storage servers that are connected to each other and share a common file system, says David Freund, a practice leader in information architectures at Illuminata, an industry analysis firm in Nashua, N.H. This attribute allows any user linked to any server to see any data stored anywhere in the cluster, and prevents any storage server or file system from becoming a performance bottleneck or a single point of failure.

As customer needs grow, they can add more storage servers without losing the benefits of the single name space that makes it easy to find and share data.

One potential class of customers for Linux-based storage are those who have purchased multiple NAS appliances and are having trouble managing them, says Freund. Unlike the case with Linux storage clusters, he says, moving files across NAS appliances is a manual, time-consuming operation that requires stopping the applications that depend on that data.

Some SAN customers might also consider clusters because SANs haven't delivered on their promise of providing large pools of storage that storage managers can slice and dice among all their applications and servers as they see fit, says Freund.

A variety of choices

Some Linux storage clusters are based on the Linux open source operating system, while others use other operating systems but are designed for use with clusters of Linux computing nodes.

Hewlett-Packard uses the open source Lustre file system as the foundation for its HP StorageWorks Scalable File Share, a storage appliance that splits storage, indexing, search and retrieval tasks across Lustre-based storage servers that work together to create a single, shared file system. As is the case with many other cluster vendors, HP says customers can add storage capacity by simply adding more servers (or, in HP's parlance, "smart cells") to the system.

Panasas, Inc.'s ActiveScale File System, the core of the company's storage arrays and switches, solves many common scalability problems by storing and serving portions of files, along with metadata about those files, across multiple storage servers, says Larry Jones, vice president of marketing at Panasas. The product also reduces storage management chores by automatically creating and provisioning new volumes as needed, rebalancing the amount of data storage on various arrays and creating new RAID groups, Jones says.

Red Hat offers the Global File System (GFS) it acquired with its purchase of Sistina Software in December 2003. GFS is sold as software that runs on Red Hat Enterprise Linux servers attached to SANs, and allows clusters of servers to share data whether the data is within a server or on the SAN. In addition to its compatibility with high-performance compute environments, Red Hat claims GFS also is useful with application and Web servers. Freund says GFS cannot scale as large, or as easily, as a Lustre-based file system but is well-suited to less demanding applications.

Ibrix's FusionFS file system, also sold as software, distributes metadata across multiple servers to achieve higher performance and fault tolerance. Ibrix claims this "segmented" file system architecture can scale to a single 16 petabyte name space and provide as much as one terabyte per second of throughput.

PolyServe's Matrix Server is a symmetrical distributed file system for both Linux and Windows, which the company claims is being used by commercial customers for applications such as reservations and point of sale systems. Freund says Matrix Server is especially suited for customers who want a plug-and-play clustered NAS system.

Avoiding bottlenecks

At Stanford University's Institute for Computational and Mathematical Engineering, a typical job involves running a fluid dynamics analysis program on a 64-processor Linux cluster to analyze how different weapons configurations will affect the performance of an aircraft. Each aircraft might require 400,000 simulations, each producing about 20G bytes of data.

Steve Jones, technology operations manager at the institute, began shopping for clustered storage when his single NFS file server bogged down after 24 dual-processor nodes tried writing data to it. After several other vendors passed on demonstrating their clusters in his environment, Panasas delivered several evaluation units that Jones got working in less than two hours.

With Panasas, his throughput has risen from about 18MB per second to between 150MB to 190MB per second, he says, with very little management required. "It's almost as simple as managing a set of hard drives connected to a RAID adapter," he says.

Maurice Askinazi, group leader of the General Computing Environment at the Relativistic Heavy Ion Collider Computing Facility at the Brookhaven National Laboratory in Upton, N.Y., had a similar problem. Askinazi's NFS servers produce petabytes of experimental data that is deposited into a tape facility by the colliders' detectors. The data is made available to the Linux computing cluster for processing and analysis.

The 100Mbit/sec bandwidth limitation of the NFS server's network interface, along with NFS's performance limit of 25Mbit/sec per request, starved the cluster of data. Because of NFS's limitations, if many nodes queried the same NFS server at once, some of the nodes would idle while waiting for data to become available.

Askinazi has since placed about 70 terabytes of experimental data on a Panasas Activescale Storage System. There are four Gigabit Ethernet network connections on each Activescale chassis. Each chassis holds 11 blades or approximately five terabytes of storage. This in effect quadrupled bandwidth and will easily scale as more chassis are added. Using Panasas Direct Flow software, all the compute nodes have unrestricted access to the bandwidth, making it unlikely that the nodes will have to wait for data.

Unlike a Fibre Channel SAN, which requires the purchase, installation and maintenance of many complex components, "When you buy from Panasas, you buy one integrated system. You just plug it in and it's ready to go," Askinazi says.

Michael Shuey, a high-performance computing architect at Purdue University, turned to Ibrix last fall to cope with storage needs that were growing "near exponentially" as university researchers gathered more and more data for each simulation they ran.

"We were primarily concerned with provisioning the storage to our computing resources, and with ensuring reasonable utilization and bandwidth to the storage," Shuey notes. "Ibrix gives us better than 98% of the maximum observable throughput from the 23TB Fibre Channel RAID array it manages, and fits fairly well into most of our Linux administrative practices."

More mainstream?

Such clusters are increasingly making their way into commercial settings, comments Freund. As the power of low-cost commodity hardware (such as clusters of Intel-based Linux servers) grows, he says, a bank scanning millions of credit card transactions to sniff out fraud may run very similar equations as the folks doing seismic analysis for an oil company — a typical technical computing application.

Employing clustering for more everyday database applications requires a detailed analysis of how the applications call the database and how well the database supports clustering. Storage clusters based on Linux (rather than a proprietary operating system) may be best for customers with the needs and technical skills to customize how Linux runs on their clusters.

Because clustering is still a young technology, and because there's a lack of standard benchmarks for comparing products, Freund recommends hiring an expert to analyze a customer's current and future database workloads, as well as its storage and server infrastructure.

In his words, "You want to take it for a test drive because its performance could be dramatically different than the vendor's claims based on what kind of I/O load your configuration presents."

Robert L. Scheier is a freelance writer based in Boylston, Mass.

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
 
Shop Tech Products at Amazon