The high speed of flash storage often makes it easy to justify its high price. But with Dell's new approach to flash tiering, that justification may no longer be necessary. Though the latest release of Dell's Compellent Storage Center includes new hardware goodies such as a very high-density 3.5-inch SAS enclosure and a raft of updates to the array's management and host integration software, the really big news is support for automated tiering between write-optimized SLC (single-layer cell) SSDs and read-optimized MLC (multilayer cell) SSDs.
The Compellent system could already automate tiering between expensive, low-capacity SLC SSDs and spinning disk. However, this new blend of the two predominant SSD techs allows Dell to claim it can deliver an all-flash solution for the price of disk. Using list price as a comparison, the same money that you might have spent on a Compellent array with 72 146GB 15,000-rpm SAS disks will now buy you a similarly licensed array with six 400GB write-optimized SLC SSDs and six 1.6TB read-optimized MLC SSDs.
Even better, that pure SSD configuration can deliver three times the transactional performance (using a TPC-C benchmark), 85 percent less latency, and 15 percent more capacity while consuming 50 percent less power and rack space than its disk-based counterpart. In other words, if you have the right balance of performance and capacity requirements to effectively leverage it, this innovation could save you a wad of cash, deliver a huge performance windfall, or both.
Dell's novel approach to tiering does come with a catch. However, understanding this gotcha and its potential effects in the field requires a deeper understanding of SSDs in general, Dell's Data Progression tiering software in particular, and how Dell has leveraged both in this new release.
A crash course in SSDsInstead of using mechanical spinning platters to store data magnetically, SSDs use solid-state flash memory to do the job. Although flash memory is used in all kinds of devices from iPods to USB sticks, the kinds you'll find in enterprise storage are typically either single-level cell SSDs or multilevel cell SSDs. The differences between the two boil down to the typical balancing act among performance, capacity, and expense.
Generally speaking, there are two enemies of solid-state storage. The first, generally referred to as "write endurance," is that each cell within an SSD can endure a fairly specific number of so-called program-erase cycles before it will no longer be able to store data accurately. Write-endurance figures are generally reflected in "full device writes per day" -- a metric that gives a user an idea of the overall lifecycle of a device.
The second enemy of solid-state storage, a phenomenon referred to as the "write cliff," is associated with the fact that each cell must undergo a (relatively) time-consuming erasure process before it can be written to. If the background process that erases unallocated cells fails to keep up with the write load the device is experiencing, the device will run out of pre-erased cells, and write performance will fall through the floor.
To combat these two problems, both kinds of SSDs -- SLC and MLC -- are typically equipped with more raw capacity than they advertise. This allows the device to spread out write operations over a larger number of cells, which increases overall device endurance and gives the device more cells to keep empty to absorb large write workloads. This slack capacity and the intelligence built into the SSD's controller to manage it are what really separate consumer SSDs from those used in enterprise storage devices. (They also explain the capacity differentials you'll find when you shop the two markets.)
Further, SLC and MLC devices are fundamentally different in that SLC devices store only a single bit per cell while MLC devices store two or more bits per cell. This means that SLCs use fewer transistors per cell as compared to MLCs, but more transistors to store the same amount of data. Thus, SLCs can sustain a much larger write workload (usually 25 to 30 full writes per day versus three per day for MLC), and it will absorb writes three to five times faster, but are also substantially smaller and more expensive than MLCs. However, SLC and MLC SSDs are nearly equal in terms of read performance (with MLC perhaps 2 to 3 percent slower) -- a fact that's crucial to understanding Dell's approach to flash tiering.
The Compellent's secret sauceEven before its acquisition by Dell in 2011, the Compellent system's main claim to fame was its Data Progression (DP) tiering software. DP's job is to free up capacity in faster and more expensive tiers of storage by moving data into progressively slower and more economical tiers.
For example, suppose your Compellent array consists of a top tier of fast, expensive 15,000-rpm SAS disks and a bottom tier of much larger, slower, and less expensive 7,200-rpm NL-SAS disks. Unless you configure it not to, the array will split incoming data into pages and write them across the disks in the top tier. Because Compellent arrays implement RAID at the page level, your array can choose which RAID level to use on a per-page basis. Since writing in RAID10 is faster than RAID5 or RAID6 (given that only two write operations are required and no parity must be computed), it will use RAID10.
However, top-tier disk capacity is typically limited and fairly expensive. The array won't want to leave that storage sitting there for long unless there's a good reason to. That's where Data Progression comes into play. At some point every day (7 p.m. is the default) DP will run as a background process on the array, moving data to different tiers and changing the RAID level based on how heavily the data has been used and what policies you have set. DP will even differentiate between the faster outer rim of NL-SAS disks versus the slower inner tracks, creating a sort of tier within a tier (Dell calls this licensed feature FastTrack).
If that block of data you wrote has been written once and not read again since, it might be moved to the bottom tier and restriped using RAID5. If it had been read more frequently, it might be left in the faster, top-tier storage, but still restriped to RAID5, which is just as fast as RAID10 from a read perspective and takes up quite a bit less space. In both cases, these changes are made by a low-priority process that you'd configure to run at a time when the array isn't under peak demand.
All in all, Data Progression's job is to give you the read and write performance of the top tier of disk for the data that needs it, while allowing you to leverage the economy of lower tiers of disk for less frequently used data. In situations where the array is sized properly, DP does this exceedingly well.
The Compellent Enterprise Manager will keep tabs on the usage of your two flash tiers -- the write-intensive SLC SSDs and read-intensive MLC SSDs.
Having your cake and eating it tooAccomplishing this same feat when tiering between two tiers of SSDs is something of a different animal. Whereas Data Progression runs once a day in spinning-disk configurations, it operates continuously in tiered-flash configurations. In the case of tiered flash, DP is also heavily linked to the array's snapshotting mechanism.
Like many fully virtualized arrays, Compellent arrays implement snapshots ("replays" in Compellent parlance) at a page level. When you write data into a volume, that data is split up into pages and written to disk. If you create a snapshot, those pages and any pages written before them are marked in a database as being part of that snapshot, but effectively nothing else happens -- no data is immediately moved anywhere. Later, if some of the volume is rewritten with new data, that data is split up and written into different pages on the disk; the original pages still exist and are ready to be referenced if the snapshot is ever needed. Once a snapshot is deleted, the pages that comprised it are freed to be overwritten.
In spinning-disk configurations, Data Progression treats pages that are part of a snapshot differently than it treats active data. Because it knows the snapshot data is far less likely to be read from once it has been replaced by newer data in the active volume, it will typically move those pages to a more economical tier during its next 7 p.m. run.
However, in tiered-flash configurations, Data Progression doesn't wait for 7 p.m. to roll around to make tiering decisions. Instead, immediately upon the creation of a snapshot, Data Progression will punt data out of the top tier that is backed by expensive, write-optimized SLC SSD and write the data into inexpensive, read-optimized MLC SSDs.
The goal of this process is threefold:
Thus, Dell's approach to flash tiering succeeds in leveraging the best that SLC and MLC devices bring to the table while avoiding the sweeping compromises made by single-tier deployments of "mixed-use SLC" (SLC with less wear-leveling capacity) and "eMLC" (MLC with added wear-leveling capacity). Said another way, it's much more like having a tiered 15K SAS/7.2K NL-SAS spinning-disk array that can give you the benefit of both types of media versus having a single-tier 10K SAS spinning-disk array that gives you something in between.
Yes, there's a catchIt's rare that an engineering decision doesn't have some kind of drawback. In this case, the catch is found in the creation of those snapshots that are so vital to the tiered-flash model. If data is immediately moved from the write-optimized SLC tier to the read-optimized MLC tier upon creation of a snapshot, there's an obvious cost to doing that. The load on the SLC tier will increase as data is read out of it and written into the MLC tier, and this can't help but impact performance on the SLC tier whenever host I/O is driving those SSDs to their limits. Worse yet, pages that are migrated from one tier to the next have to be locked during the operation, and this can cause contention in very high-I/O situations given that committing data to the MLC tier takes three to five times longer than reading it from the SLC tier.
To test the impact of this, I created a worst-case scenario in the lab. I set up a series of volumes and started directing a breakneck read and write load at all of them. In my case, it was a stream of randomized 4K I/Os with a 70/30 mix of reads versus writes (very roughly approximating an OLTP workload). This workload was isolated to a fairly small footprint on the array (about 80GB in total).
Enterprise Manager will also help you keep an eye on the health and wear of the SSDs.
At first, the entry-level "6+6" (SLC+MLC) configuration handled this workload entirely with the SLC tier and clocked in at more than 70,000 IOPS with sub-5ms latencies -- truly impressive considering a similarly priced spinning-disk array would be hard-pressed to serve up a third of those IOPS with three times the latency. However, things took a turn for the worse when I created a snapshot that simultaneously impacted all the volumes I was throwing my workload against. The I/O stream came to a screeching halt -- immediately dropping to about 3,500 IOPS and slowly crawling back up to its previous speed over a period of a few minutes.
Any storage admins out there who are reading this right now will realize how crippling that could be in a production scenario. Having your storage throughput suddenly drop by 95 percent and your storage system take minutes to recover because you created a snapshot would be very bad indeed (think every phone in the help desk ringing at once). However, good storage admins will also recognize how incredibly unlikely this scenario is in most real-world situations.
The production loads you'll find out in the field are generally very bursty on a subsecond basis. That is, if you were to create a graph of the duty cycle of a primary storage array in a typical enterprise with a resolution of 10ms or 20ms, you'd see it bounce around all over the place. The array could be very busy, but still have a lot of slack space where no transactions were being executed. It's in this space where the on-demand portion of Compellent's Data Progression software gets its work done, and where it can work without impacting host I/O to any great degree.
In my artificial lab test, however, the array was being pushed to its limit -- effectively creating a 100 percent duty cycle. This left no room for Data Progression to do its work and created enough congestion between host writes and time-consuming SLC to MLC data migrations to cripple overall performance.
In the real world, if you have a write-heavy workload that requires the full raw performance of SLC flash 24/7, you probably won't want to leverage Compellent's Data Progression software at all. Instead, you'll deploy enough SLC capacity to hold all the data you intend to hit this way and configure the storage policy that applies to it to prevent Data Progression from moving it out of SLC flash. That would neatly sidestep the entire issue while still allowing less brutally assaulted volumes on the same array to take advantage of the economy presented by SLC/MLC tiering.