Yahoo to manage ‘hundreds of thousands of servers’ using OpenStack

Yahoo is set to manage its huge data centre operations with OpenStack, meaning the open source platform will be responsible for provisioning “hundreds of thousands of servers” by the end of the year.

The 20-year internet firm has been a significant contributor to the open source project for a few years, running the software across parts of its virtualised and bare metal infrastructure. It now plans to complete the migration for its entire estate in the coming months.

“Yahoo was one of the first mega-scale infrastructures in the world. By mega-scale I mean servers in the hundreds of thousands,” said Yahoo’s principal architect, James Penick.

“The problem is that when you are one of the first ones out there that reaches that scale, there is no product that can support you. So we had to build a lot of bespoke tools to handle that. We had hundreds of thousands of physical servers, tens of thousands of VMs, and at the moment, those tens of thousands of VMs are handled by OpenStack and tens of thousands of bare metal serves are currently managed by OpenStack.

“By the end of this year we anticipate having the majority of those hundreds of thousands of servers all managed entirely by OpenStack. I keep joking that we are building one of the largest bare metal clouds in the world.”

Bare metal migration to Ironic

A significant part of Yahoo’s data centre estate consists of bare metal servers - single tenant servers that do not run virtualisation.

“The truth is that there are some workloads for which bare metal is still better. VMs are great, super fast. Containers are lightweight and super fast. But they still use cases where bare metal is faster and more powerful," he said.

“So grid computing is an example of this, and there are other high usage applications where you need to have the low latency, you need nothing between you and serving traffic.  So in those cases you use bare metal.”

As it grew its OpenStack deployment, Yahoo had relied on a relatively rudimentary method of managing its bare metal environment, using the Grizzly release of OpenStack’s Nova compute module.

However, the firm is now moving to the Ironic bare metal module, available as an ‘incubated project’ for six months, and now fully available with the latest Kilo release.

“We used Nova bare metal to [manage bare metal servers]. We did this because Ironic was not ready yet. But we really needed to get the company moving, we really needed to get this API in front of our infrastructure,” Penick explained.

“So the whole time we have been working aggressively to develop Nova bare metal, and move the entire company on to it, we have also simultaneously been investing very heavily in Ironic.

"And then pretty much at the same time we have finished moving the entire company to OpenStack bare metal, at that point we will be finished developing and deploying Ironic. Then we are going to move all that hardware again, pick it up and drop it over to Ironic.”

“About this time next year, we should have hundreds of thousands of compute resources managed by OpenStack Ironic, and none with bare metal Grizzly code.”

OpenStack challenges

Yahoo's infrastructure would dwarf many pubilc cloud providers.  One of the challenges Penick says he faces with OpenStack is the ability to run a hyperscale environment as an internal private cloud, from a business and financial process perspective.

For example, this means requiring the need to ensure that users are accessing the correct infrastructure for their application requirement, rather than using large volumes of expensive hardware.

"In a public cloud you generally have a pretty big quota that you can consume, because it is great - from a public cloud perspective I want to give [users] lots of quota, because if you go out you are going to give me money in return," he said.

"Private cloud is a little bit different. We have to make sure that you are actually only using as much as we are prepared to give you, within certain constraints, obviously with a good amount of headroom.

"So if we say ‘yes, go ahead and boot some servers, and you can boot these with some old 15k SAS drives’ and you turn around and boot a bunch of machines with SSDs, that is a problem. We need to make sure that you are only booting the hosts that you have been approved for, especially if there are some that cost a great deal of money and are a specific use case.  So we need a way of doing quota by flavour."

Copyright © 2015 IDG Communications, Inc.

8 highly useful Slack bots for teams