Dropbox moves 90% of its data off Amazon AWS, in favor of its own private cloud. Dropbox built its own, custom storage servers, to store half an exabyte or more, mirrored across three regions.
Sometimes, you get so big that public cloud pricing doesn't make sense any longer. But building those servers from scratch and moving all that data sound like an enormous undertaking.
In IT Blogwatch, bloggers feel thirsty for a delicious ginger beer. [You're fired -Ed.]
Your humble blogwatcher curated these bloggy bits for your entertainment.
What's the craic? Cade Metz calls it an Epic Story:
If you’re one of 500 million people who use Dropbox, it’s just a folder. ... Peer behind that folder, however, and you’ll discover an epic feat of engineering.
…
For the first eight years of its life, you see, Dropbox stored billions and billions of files...atop what is commonly called “the Amazon cloud,”...rather than machines owned and operated by Dropbox. [But] over the last two-and-a-half years, Dropbox built its own vast computer network [and] moved about 90 percent of those files onto this new online empire.
…
Today, more and more companies are moving onto “the cloud”—not off. ... But some companies get so big, it actually makes sense to build their own. ... Dropbox says it’s now that big.
…
Amazon...declined to comment. ... Dropbox has built its own box.
What's the benefit for Dropbox? Iain Thomson explains—Dropbox slips 500PB into its Magic Pocket:
Dropbox has sucked the vast majority of its data off Amazon. ... The firm has always kept metadata...in-house, but [is] now handling 90 percent of its own storage.
…
Dropbox can...increase performance by taking out the public cloud lag. [And] the company [says it] can save money through hardware and software optimization.
…
On one level, going in-house is a sign of the maturity of the company. ... But it has also got to be a fairly pricey operation to run.
Want more detail? Here's Dropbox's Akhil Gupta—Scaling to exabytes and beyond:
Years ago, we called Dropbox a “Magic Pocket.” ... And when our scale required building our own...infrastructure, we named the project “Magic Pocket.”
…
We’ve always had a hybrid cloud architecture. ... We were an early adopter of Amazon S3, which provided us with the ability to scale our operations. ... We couldn’t have grown as fast as we did without...AWS.
…
Our use case for block storage is unique. We can...customize both the hardware and software [for] better unit economics. ... It was clear to us...we’d have to build everything from scratch.
…
February 27, 2015 was D-day. For the first time...we began storing and serving [some] user files. ... On April 30, 2015, we began the race to install additional servers in three regional locations fast enough. ... We built a high-performance network [that] allowed us to transfer data at...over half a terabit per second.
What color is your parachute? That's Mike Wheatley's angle—Dropbox bails out:
More than anything else, [it] highlights the argument for the private cloud [at] a certain scale. While some companies...have gone all-in on the AWS cloud, others have flitted back and forth. Games maker Zynga [shows] the grass isn’t always greener on the other side.
…
Dropbox’s own case suggests that its reached a scale at which its incremental costs will actually go down. [But] it first has to build its own infrastructure and maintain it. [And] if the company fails to scale...those cost savings may never materialize.
Dropbox's "Diskotech" custom storage boxes sound similar to what Backblaze is doing. riobard runs the numbers:
Basically Diskotech stores 1PB in 18" × 6" × 42" = 4,536 cubic inch volume, which is 10% bigger than standard 7U. [Backblaze] is [storing] 180TB in 4U. ... Doing the math reveals that Dropbox is basically packing 793TB in 4U.
…
Diskotech is about 30% bigger in volume than [Backblaze] Storage Pod 5.0 but with 470% more storage. ... Amazing engineering.
And here's some more insider info. Courtesy of Dropbox's jamwt:
One of the games in this project is optimizing how little memory and compute you can use. ... . We utilize lots of tricks like perfect hash tables, extensive bit-packing...cache-friendly data structures...lockfree object pooling stuff for big byte vectors. ... It's much easier to do these particular kinds of optimizations using C++ or Rust.
…
Performance is 3-5x better at tail latencies. Cost savings is.. dramatic. I can't be more specific.
…
[Diskotech] supports host-managed...disks that come in 10T and 14T sizes. ... We have flash caches (NVMe) in the machines, but the long-term storage devices are spindle--PMR and SMR.
You have been reading IT Blogwatch by Richi Jennings, who curates the best bloggy bits, finest forums, and weirdest websites… so you don’t have to. Catch the key commentary from around the Web every morning. Hatemail may be directed to @RiCHi or itbw@richi.uk.
Opinions expressed may not represent those of Computerworld. Ask your doctor before reading. Your mileage may vary. E&OE.