Companies may want to skip using a tiered storage architecture and move directly to an all-solid-state-drive (SSD) architecture, according to a new report from Forrester Research.
In the report, Forrester contends that while enterprise-class SSDs are vastly more expensive than hard disk drives, deduplication can reduce capacity requirements, making flash a cost-effective, better-performing alternative.
"If cost were no object, you would put all your data on flash-based SSD media," the report said. "It's not only much faster than spinning disk drives are today, but it also has no moving parts, consumes less power, and eliminates the seek time and variable performance -- and there's no chance disk drives will catch up in any of these areas."
SSDs are now used as a top tier of storage in external storage arrays, alongside a combination of different hard drives, such as high-capacity SATA and lower capacity, but higher performance, SAS and Fibre Channel drives. The idea behind tiered infrastructures is to put the most highly accessed data on the highest performance drives, migrating less frequently used data to high-capacity, low-cost hard drives.
But major storage vendors of tiered arrays have "shoehorned flash drives" into their existing disk arrays, which can translate into I/O bottlenecks. It also means administrators must know what data to place on the SSD or rely on still-nascent automated data tiering software.
High costs, management woes
According to Forrester, SSDs can be up to 10 times more expensive than hard drives; other research firms peg the costs far higher. Market research from other firms such as iSuppli and Objective-Analysis shows SSD pricing averages around $17 per gigabyte today; it's expected to drop to $12 a gigabyte next year and dip to $5 per gigabyte by 2015.
While tiered architectures can offer better performance and higher disk utilization rates, Forrester's report said that tiering also creates data management problems.
For example, many corporate IT shops don't use advanced storage performance analytics tools, so they have to manually determine which data requires the highest performance and manually move it throughout a tiered architectures. Additionally, "hot data," or the data most frequently accessed, changes over time. That means IT staff will be busy monitoring and moving data as it changes.
While there is automated tiering software, such as Dell Compellent's Fluid Data storage offering and EMC's Fully Automated Storage Tiering (FAST) software, retrofitting existing systems that weren't designed for sub-volume data movement "is a significant challenge," Forrester said.
"The efficiency and effectiveness of these solutions vary. There's also an inherent performance overhead penalty to the constant movement," the report said. "Finally, the information used to make decisions is backward-looking -- just because a piece of data hasn't been hot recently doesn't mean that it won't be in the future."
Enter in-line data deduplication
However, a new architecture now making waves is an all-SSD infrastructure where inline data deduplication is used to reduce backend-capacity requirements by eliminating redundant data sets before they're stored.
This is even more effective than deduplicating data across both disk and SSD because the performance surplus and consistency of latency across all data in an SSD-only solution enables faster inline deduplication and data rehydration than in a hybrid disk and flash system, Forrester said.
There's a big difference between single-level cell (SLC) enterprise-class NAND flash and mutli-level cell (MLC) flash in terms of performance, longevity and price. SLC NAND stores only one bit of data per cell versus two or three bits per cell in MLC. That translates into SLC having natively higher performance and a lifespan as much as 10 times that of MLC.
Currently, the price for NAND flash in an SSD form factor is about $9 per gigabyte for SLC flash and about $3 per gigabyte for MLC flash. A new class of MLC, called enterprise MLC or eMLC, can withstand up to 30 times more writes than consumer-grade MLC flash technology can, but it also costs about 20% more.
By comparison, a Fibre Channel or SAS drives costs 50 to 60 cents per gigabyte.
There are also PCIe NAND flash cards, like those sold by Fusion-io, Texas Memory Systems, Micron or Virident Systems. Flash cards can be used in all-flash arrays or in application servers themselves. While prices can go through the roof, so does performance -- thanks to the higher speed interconnect and the proximity of the flash storage to the central processors.
Forrester's report focuses on SSD-only options from three vendors: Nimbus Data Systems, Pure Storage and SolidFire.
While other vendors do offer all-SSD arrays, they don't come with in-line deduplication, so they weren't included in the study, according to Forrester analyst and lead report author Andrew Reichman.
According to Forrester, Nimbus offers the broadest protocol support, allowing users to connect to their controllers via Fibre Channel, Gigabit Ethernet or 10GbE iSCSI, CIFS, NFS or direct InfiniBand.
Recently, eBay rolled out a 100TB Nimbus SSD array to deal with bottleneck issues with its NAS and SAN storage. The Nimbus S-Class array reduced eBay's rack space needs by 50% and reduced power use by 78%. Most importantly, the SSD's performance reduced the time it takes eBay to bring a new virtual machine online from 45 minutes to five minutes.
The Nimbus architecture is based on two x86-based controllers supporting up to 24 2U (3.5-in high) servers per cluster filled with eMLC flash. One array can hold up to 250TB. Deduplication is optional. Nimbus prices its product on a per-terabyte basis -- it charges $10,000 per usable terabyte.
Pure Storage is also a dual-controller system that scales to 22TB, though the vendor has plans to increase the number of controllers and capacity. Pure Storage is considered a high availability option because its two controllers are active. The system's inline compression and deduplication is always on and works down to a 512-byte chunk size. Pure Storage states that after a 5:1 deduplication ratio, its arrays retail for $5-per-gigabyte, meaning a 22TB array would retail for $110,000. However, based on raw capacity with RAID, the array retails for $25 per gigabyte or $555,000 for a 22TB array.
Lastly, SolidFire plans to go to market in January with a scale-out, clustered storage product filled with MLC flash. The hardware is expected to have inline deduplication and compression as well as thin provisioning. Thin provisioning allows the system to offer capacity to application servers on an as-needed basis, as opposed to the more traditional approach of over provisioning. The system, which will offer a guaranteed quality of service, only uses the iSCSI transfer protocol in order to keep prices low, according to Forrester. The system, however, is expected to scale to 1,000TB, or 1 petabyte in size.
In its report, Forrester cautions that the all-SSD arrays it references are relatively new to the market (this year or last year) and need scrutiny by potential users for deficiencies, such as a lack of data snapshots, or replication and application compatibility issues. An SSD-only storage infrastructure depends heavily on the effectiveness of deduplication.
"Although some of the cost efficiency comes from both the price declines of flash and use of cheaper versions, you can't come close to the cost of disk without enabling much better data deduplication," Forrester said.
Better deduplication requires sufficient CPU capability to do the inline process without diminishing performance; an effective deduplication algorithm; and data such as email and other documents that lends itself to data reduction.
"While dedupe can bring the all-SSD architecture closer to the cost of disk, it remains to be seen whether this can be a viable alternative to disk-based systems," Reichman said.
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed . His e-mail address is email@example.com.