When it comes to storage, cache is king

Jim Gray

Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King.” So said the late Jim Gray of Microsoft Research, in 2006. While tape never really died, a new wave of vendors have picked-up his original theme, predicting that flash storage is on the verge of replacing traditional disks.

Yet, attractive as the idea of replacing all your spinning disk with flash or solid state may be, unless you’re making uberdollars from real-time currency trading, it’s probably too expensive to consider a wholesale swap-out of an entire disk-based enterprise data infrastructure. One approach to the higher cost of flash is to create yet another tier of storage, and only put your high value data on flash, but even with good tools, moving data into and out of this new “Tier-0” will still introduce new points of management.

Traditional storage tiering also tends to lead to overprovisioning of the highest tiers. People will always want their data on Tier-0 or the “Gold SLA,” even when it’s not justified by their workload characteristics. Perhaps the best way of dealing with this is to remove the politics of storage provisioning and automatically move only the hottest data to the fastest storage.

There’s a bunch of good ways to do this, and the more you automate this process, the more that flash storage looks less like a tier, and more like a write-behind or write-through cache. By thinking of auto-tiering as a form of caching, we can begin to answer one of the most frequently asked questions regarding flash, which is “how much do I really need?”

A good answer to that question can be found in an ACM paper by Goetz Graefe titled The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules.” Published in 2008, it expands on work by Jim Gray and Gianfranco Putzolu on the economics of sizing caches, published some twenty years earlier.

If you like deep-diving into tech, and you’re interested in how flash is changing computing architecture, then you should read that paper. It deals with a whole stack of stuff that most people are just coming to terms with today, including whether flash memory should be thought of as a special part of main memory or as a special part of persistent storage. It also has some interesting perspectives on drive wear-out, and makes the case that an SSD will absorb more writes than a disk over its typical lifecycle.

The five-minute rule that Jim Gray originally came up with was an elegant way to balance the price-per-IOP and price-per-GB characteristics of RAM and disk. It also enabled him to make some remarkably accurate predictions about future costs. The formula can be paraphrased as follows:

From an economic point of view, your cache should be big enough to hold at least 5 minutes worth of your most recent data.

Or, more formally:

BreakEvenIntervalinSeconds = (PagesPerMBofRAM /  AccessesPerSecondPerDisk) × (PricePerDiskDrive /  PricePerMBofRAM)

Goetz Graefe then works through the same formula using the costs and performance characteristics for flash, and he comes up with a couple of really interesting new findings:

  • If you replace disk with flash as the next tier in the storage/memory hierarchy, then you need much less RAM. In other words, the cost of flash can be offset by a reduction in the amount of money spent on RAM. (However, I’ve never seen that approach used in any metrics around the storage economics of flash.)
  • You should be keeping about 2½ hours of your most recently-accessed data inside your flash cache or tier. That means you should size your flash to hold the entirety of your active data. This surprised me, because while it certainly makes sense from a performance experience perspective, I wasn’t sure it could be economically justified. But if Jim’s formula is right, then there’s a pretty solid business case based on hard metrics.

But does this really tell you how much flash you should implement? Unfortunately, the information about how old the data is in your cache can be hard to find, and figuring out how much cache will result in an optimum mix often seems to involve:

  • pulling out a bunch of black candles,
  • finding a goat, and
  • asking the infrastructure team to forge an alliance with the dark powers of predictive analytics.

While I believe that the instrumentation around this will improve over time, a relatively simple way of determining the optimum cache size today is to make your SSD cache a little larger than your active data. Figuring that out involves collecting and analyzing workload traces from your own environment, or you can fall back on simple rules of thumb like the following:

  • For workloads where the total usable data is less than a TB, assume a 10% working set size.
  • For workloads where the total usable data is more than a TB, assume a 5% working set size

Exactly how much cache makes economic sense will be based on a bunch of things, including the page size used to swap things into and out of cache, whether you can do things like cache compression, and the relative costs of the cache and or disk in your environment.

But in the longer term, the trends seem clear: competition between vendors and the falling prices of solid-state memory mean that all of your active data will be sitting on flash (or some newer form of storage-class memory). If you then combine that with the compelling economics for cool and cold storage from the hyperscale cloud vendors, your datacenter and data management will end up looking very different from the way it does today, all driven by of storage class memory caches.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon