IT managers looking for battle-hardened, enterprise-scalable storage for their most demanding applications have a surprising new alternative: network-attached storage (NAS). While NAS (also known as file serving) itself is not new, next-generation NAS technologies are now on the market. NAS is in its next stage of evolution and includes industrial-strength solutions for IT managers seeking to streamline storage and file-server management.
While many still think of NAS as a workgroup file-sharing technology, an expanding group of storage managers now deploy NAS as their preferred storage infrastructure for mission-critical applications. NAS deployments continue to proliferate across departments and for core applications that demand exceptional performance and capacities. Applications include enterprisewide file-server consolidation, medical imaging, Web content delivery and computer animation.
In enterprise environments, IT managers face a growing tsunami of data that must be stored, protected and shared. With what are often small budgets, IT managers need more efficient technologiess that enable:
-
Efficient management of large amounts of data files for many users and servers
-
Dynamic adaptability to changing performance and capacity needs, and scaling without downtime
-
24/7/365 availability and data protection including point-in-time (PIT) snapshots
These demands are not the traditional province of NAS. While NAS is known for great data-sharing capabilities, earlier solutions fell short in performance and capacity scalability. NAS vendors have responded with new, more enterprise-capable technologies that deliver enhancements in scalability, functionality, interoperability and cost-efficiency.
What are these new technologies, and how can they benefit your applications? This article examines typical large-scale NAS deployment scenarios, how they are similar and divergent, along with three NAS architecture options (legacy NAS, parallel and clustered file system NAS and clustered NAS gateways).
Two NAS deployment scenarios
Large-scale NAS deployments can be divided into two broad categories: commercial or enterprise wide-data sharing, and high-performance computing (HPC) or technical computing. There are limitless subcategories under these, but the broad classifications are useful, since they capture the most critical differences between them.
This is not to say that enterprise and HPC environments are completely different; they do share common requirements that make NAS in general a good fit for both.
Enterprise and HPC similarities:
-
Data sharing: In both enterprise and HPC applications, data is shared among multiple clients or servers. NAS is well suited for data sharing since any system on the network (if properly authenticated) can access data. Data can be readily shared among Windows, Unix, Linux and Mac platforms using NFS and CIFS multiprotocol access."
Scalability is essential: Both deployments demand large-scale performance and capacity, far greater than previous NAS technologies delivered. Medical imaging, for example can generate very large files, while animation and high-definition TV can demand throughput of multiple gigabytes per second. University home directories, on the other hand, may support tens of thousands of users. -
Availability: Enterprise applications naturally demand exceptional availability. Enterprise NAS has traditionally been strong in the area of reliability and data integrity. Because NAS devices are designed specifically for file serving, they tend to have inherent reliability, availability, and serviceability, along with integrated data protection capabilities.
Where Enterprise and HPC Diverge
The differences between the Enterprise and HPC deployments are more important than the similarities, because it is the unique characteristics of their requirements that define the next-generation NAS architectures.
|
Enterprise applications
Enterprisewide storage consolidation is essentially a large-scale extension of the workgroup deployment. Examples include large-scale home directory, enterprisewide file sharing, Web content delivery, education and health care.
In enterprise environments, a typical deployment objective is server consolidation. With millions of Microsoft Windows servers currently deployed worldwide, consolidation is a common issue. Within an organization, the need to consolidate becomes apparent when the server population outgrows the ability of the IT staff to manage them effectively. As server populations grow, IT managers often conclude that an investment in server consolidation is more cost-effective than continued staff expansion. A consolidation initiative requires significant planning because the intended transition (from many servers to a single centralized solution) has implications across the organization, implications that drive important requirements for the consolidation solution.
The politics of storage and departmental control
The most significant hurdle for server consolidation initiatives in many environments is not technology-related, but internal politics and control of resources. Consolidation projects run into trouble when groups that had control of their own servers resist the change to centralized storage. Whatever their motives -- it may be security concerns, application or client isolation, performance concerns, or simple turf wars -- the consolidation plan must take the politics of server and storage consolidation into account.
Virtual servers combine consolidation and autonomy
For the IT manager, there are two options. The first is to maintain separate file servers (or NAS devices) for specific groups. Since this is contrary to the whole point of consolidation, it is hardly optimal. The next option is a NAS deployment that includes virtual servers. Virtual servers are independent virtual entities (each with its own name, TCP/IP (IP) address, and storage capacity) that coexist within one physical device. Groups retain autonomy while still allowing the IT manager to consolidate hardware to a smaller number of physical devices. If performance is a concern, the issue is easily addressed; virtual servers allow transparent load balancing that eliminates bottlenecks.
Random throughput scalability
Another important factor in enterprise deployments is the nature of the workload: Data accesses tend to be random in nature and spread across many clients. Hence, random throughput and the ability to seamlessly scale random throughput becomes the critical criteria. Fast random throughput translates to high productivity for clients and servers, which is why the independent performance benchmarking organization SPEC.org reports only random throughput in its file-server benchmark. The ideal enterprise deployment for transaction and small I/O workload should scale random throughput transparently.
Robust multiprotocol support
Multiprotocol support is essential since enterprise deployments typically include a mix of Windows, Unix, Linux and Mac platforms. For a smooth consolidation project, user administration and authentication across all platforms must transition easily from the distributed environment.
Snapshots
Recovering data from tape is a costly and time-consuming process. Snapshots (or point-in-time images) greatly reduce the need to recover from tape, and is therefore essential in enterprise NAS for rapid backup and recovery as part of a data protection strategy.
Ease of use
Enterprise environments need easily managed technologies that do not require special skills or education.
High-performance computing
The second new application for NAS is high-performance computing, or HPC. Significantly different from enterprisewide consolidation, HPC environments tend to be more centralized and usually support a specific application such as computer graphics, simulations, seismic analysis or video postproduction. Compared with enterprisewide NAS, HPC will serve a smaller number of clients (usually of a single-platform type), situated within a single environment.
Sequential throughput is key
The application demands for HPC are primarily performance-driven, and will succeed or fail based on one specific metric: sequential throughput. Workloads in HPC tend to be generated by a smaller number of devices, and the file transfers themselves tend to be much larger. Data is more likely to be confined to a single file system, so the ability to scale throughput from that file system becomes critical. From a performance perspective, throughput (megabytes per second) is more important than I/Os per second.
Less demand for traditional NAS attributes
HPC computing environments usually have minimal need for other NAS attributes required by enterprise environments. For example, robust multiprotocol is not essential, since the environment is usually Unix-or Linux-dominated. Snapshots are unnecessary since the solution is not intended for large user groups. Ease of use is often less critical in HPC technical environments, since users tend to be localized and highly technical.
NAS gateways or parallel-access NAS
Because enterprise and HPC environments present very different requirements, attempting to accommodate both with a single solution would make little sense. Vendors have consequently responded with technologies optimized for each. NAS gateways and parallel-access NAS provide an alternative geared toward each of these two environments respectively.
Legacy NAS
To provide some context, it is helpful to first review the essentials of legacy NAS. The conventional NAS deployment includes either a single node or pair of nodes "NAS heads," each supporting a specific set of workloads and file systems. In fail-over scenarios, workloads can move from one node to the other, but normally they remain fixed on one NAS head, accessing disk drives shared between the nodes.
Legacy NAS characteristics:
-
Simple active/active architecture for redundancy
-
Turn key appliance with integrated disk and NAS processor
-
Scaling performance and capacity limited by the nodes
-
Robust multiprotocol and host server platform support
-
File system to node affinity where a file system is active on only one node at a time
Ironically, the ease of use and deployment as a turnkey appliance is also the weakness of legacy NAS when it comes to scaling capacity. Since clients and disk capacity are associated with one the NAS heads or clustered pair of NAS heads, when requirements outgrow the capabilities of legacy NAS, users must be manually migrated to a new NAS device. This management headache of ongoing resource shuffling ultimately becomes the limitation to growth.
Clustered NAS gateway
Clustered NAS gateways resolve the chief drawback of legacy NAS by eliminating the disruptive nature of moving workloads. Performance and capacity scale fluidly without data migration or user disruption.
To accomplish this, clustered NAS gateways employ abstraction, also known as virtualization technologies. Clients and servers access virtual servers that can be transparently moved among any of the NAS gateways in the cluster. Thus, the deployment can grow on demand to meet changing needs. The virtual servers support consolidation while maintaining autonomy and isolation of users and workloads. This concept has a direct analogy in the application server space. Several vendors offer software that create virtual applications servers within a clustered server environment.
Clustered NAS gateway characteristics:
-
NAS gateways within a cluster are independently available to service requests
-
Performance of the NAS gateway clusters scales throughput with the addition of NAS gateways
-
Transparent load balancing achieved by shifting virtual servers among NAS gateways
-
Multiprotocol and platform (Windows, Linux, Unix, Mac) support
-
Integrated point-in-time (PIT) snapshot copies for data protection
-
For capacity scaling, leverage external open storage
Clustered and parallel-access NAS
Clustered and parallel-access NAS devices deliver a fundamentally different approach than clustered NAS gateways. Designed for high-performance computing and technical environments, this approach delivers scalable sequential throughput, rather than the scalable random throughput found on enterprise NAS gateways.
Clustered NAS gateway scaling model
Virtualization allows clustered NAS gateways to scale in a seamless manner: