A hyperscale cloud data center looks different from an enterprise data center, or even a large hosting provider. The problems they face are different from the problems you face. And your approach to everything from how you choose a site to how you manage power to how long you keep servers is not their approach.
Here are some of the significant differences to think about if you’re considering a hybrid cloud that involves you running Azure Stack or a hyperconverged infrastructure:
- When your hard drives fail, it’s nearly always because of vibration issues. A cloud data center runs in systems so carefully designed that the main reason for failure is humidity.
- You care about maintenance and schedule patching carefully, cluster by cluster or even server by server; a cloud data center cares about self-managing, self-healing automation and think in terms of a “stamp” that might have 800 servers in it.
- You virtualize workloads to fully utilize processors; they reserve 20 of the 800 servers in the cloud data stamp for running management software.
- As your servers age, you repurpose them for less-demanding workloads; a cloud data center buy servers that are delivered racked and stacked, often inside a container, and after three to five years they’ll be forklifted out and replaced en masse by new racks of servers that have lower operating costs.
- You care about the cost of power and cooling, and access to power can stop you expanding your workloads; a cloud data center choses a location specifically because it reduces cooling costs, have a power line coming straight from a hydroelectric power plant, and see doubling the size of the data center — from one mile long to two — as an opportunity to move to a new generation of hardware and a new way of laying out the data center.
- You might worry about getting connectivity to your data center from two different providers; a cloud data center invests in their own underwater data cables.
And then there’s the matter of scale …
“I have to grow my network exponentially through 2020,“ Rick Bakken, Microsoft senior director for data center evangelism tells us. “For certain regions the capacity plan looks more like a flagpole than a hockeystick chart.”
Even the physical infrastructure is enormous: The latest data center facility Microsoft is building in Quincy, Washington has 24,000 miles of network cable, which is nearly enough to go around the earth, and the Azure data center in Singapore has twice that, as well as enough concrete to build a sidewalk from London to Paris.
A rare look inside Azure
Seeing inside a hyperscale data center like the one in Quincy is one of the fastest ways to convince cloud skeptics about the security of public cloud. That’s slightly ironic because that security makes it pretty hard to make a visit. You can’t even send your audit team to check the facility.
Microsoft recently gave CIO.com a tour, as part of the first group of journalists to see inside an Azure data center in a decade, and there were plenty of restrictions: no photographs or recordings and no information that would compromise the security of the facility (the images here were supplied by Microsoft).
Outside are high plains that get over 300 days of sunshine and only eight inches of rain a year (and a foot of snow), with the temperature averaging 50 degrees most of the year, peaking in the high 80s for two or three weeks in the summer. It’s an arid climate that suits the local fruit growers and also makes it efficient to cool a data center. And the nearby Columbia River produces plenty of power, which is why Microsoft chose Quincy for a data center site back in 2006 (as have Dell and Yahoo, as well as data center providers like Vantage and Sabey).
The buildings are anonymous, with no Microsoft sign. The newest facility has an anti-scale fence on a raised berm so you can’t drive past and see how it’s laid out (and while the signs for the individual buildings will look familiar if you’ve visited a Microsoft campus, they don’t have the Microsoft name or logo and you can’t see them through the fence).
Inside are significant security measures: biometrics and double doors for employees to go through — and those employees have had background checks that involve fingerprints and looking at police records. Even the shipping and receiving department, which has the kind of giant, ceiling-high shelves you expect at warehouse stores, has inner and outer doors that can’t be open at the same time. More biometric locks secure individual rooms (hand scanners in older buildings, fingerprint scanners in newer facilities).
You’ll hit checkpoints at various points inside the buildings where guards use scanning wands to make sure you’re not bringing anything in or taking anything out. Microsoft uses what Bakken calls a “white glove removal” process, dismantling old equipment to recycle — except no hard drives leave the building. If they’ve been used to store lower business impact data, they’ll be recycled internally, but if they’ve stored high business impact data (some racks are tagged HBI), they go in the disk shredder. After shredding, “there’s nothing left larger than a BB pellet.”
Short on staff and maintenance
Your data center probably doesn’t have corridors so long that there are kick scooters standing around for operations staff who need to get to a distant room. And even though enterprise data centers are smaller than hyperscale cloud data centers, you likely have more staff. The critical ops teams in an Azure data center are far smaller than you’d expect (tens to several dozen people depending on the size of the data center is as precise as Microsoft will say), though there are three times as many guards per shift), and they have very different skills.
They’re not replacing failed network cards and hard drives, updating firmware or scheduling maintenance windows. They’re running the automation and ignoring hardware failures because those are taken care of automatically.
“Outages happen, people make mistakes, software has bugs,” says Bakken, “so let’s make it self-healing. If something breaks I want to know it broke but Ihave a set of provisions and contingencies that protect and heal the system. As far as OpEx, with the newer data centers I replace the filters [in the cooling systems] and that’s about the only maintenance I have. We’ve moved to a resiliency configuration where I put more servers in each box than I need and if one breaks I just turn it off, walk away and wait until the next refresh cycle.”
From buildings to containers — and back
That refresh cycle usually also means dramatic changes to the data center architecture. When you buy servers for your data center, you get them from an OEM like Dell or HP. Microsoft used to do that, buying in bulk or even a container at a time. Now it’s designing its own servers to get exactly what it needs at lower cost, ordering them from ODMs and contributing the server design to the Open Compute Project (OCP).
The various facilities in Quincy are a microcosm of these changes. The two oldest buildings, which Microsoft refers to as generation 2, look like a traditional data center, but unlike the average enterprise data center they’re not crammed with racks and servers. As Microsoft switches to the new OCP servers here, the routers and load balancers disappear in favor of virtualized networking — and rooms that used to have 18 rows of racks now have only eight, because the compute and power density is so much higher. “We have the same power budget; we have more servers, but fewer of the racks are populated because of the power budget,” explains Bakken.
The hot aisle (which gets up to 106 degrees) is isolated by the kind of transparent plastic panels you see insulating industrial refrigeration areas. The roof has recently been sprayed white to improve power efficiency — an obsessive attention to detail with its roots dating back over a decade when Bakken worked on Steve Ballmer’s capacity planning team: “We came to the realization that we were building really large air conditioners; we were in the industrial air conditioning business.”
The solution wasn’t AC. Microsoft was able to reduce the amount of power it needed for cooling by switching first to using external air cooling and then to adiabatic cooling, which works on the same principle as a “swamp cooler”-- spraying water into the air in front of a fan keeps the room cooler because the heat evaporates the water instead of heating the air.
Cutting cooling costs
If you built your data center in the last few years to the latest design, you might have a power usage effectiveness (PUE) score of 1.6 or even 1.4, like the generation 2 data centers Microsoft was building in 2007 and 2008. That means you’re only using an extra 40-to-60 percent of the power it takes to run the servers and network to step down the voltage for the batteries in your uninterruptible power supply (UPS) and — mostly — keep the servers cool.
If you built your data center over a decade ago or you used a more conventional design, you’re using more like two to three times the power for cooling that you are to actually run your workloads.
Microsoft’s change in thinking led to the generation 4 containerized data centers it built in 2011 (with the catchy name of ITPACs), using outside air, and adiabatic cooling on only the hottest days, bringing the PUE down to 1.2 or 1.12.
And the new generation 5 facility that’s just about to open in Quincy has a PUE of 1.1 (which drops even lower at the right time of year).
The ITPAC design crams a few thousand servers into a container. Microsoft gave its specifications to two large server OEMs and wanted them to deliver a container that it could plug in by hooking up one 440v power line and one network cable. The two companies came up with very different designs: One fit into a standard shipping container and had separate hot and cold aisles; the other was a custom pod with a single, shared hot aisle.
Both were delivered by lifting them onto the thick concrete base with a crane, and then Microsoft built a roof on top. The next day, four feet of snow drifted into the buildings. This didn’t cause any problems for the servers, but it was awkward to walk through, so they added louvered screens to keep the snow out but let the cold outside air in.
That air runs through multiple sets of filters to remove dust and dirt, and on the hottest days it’s cooled with a water spray before being blown through the containers.
Later versions of the ITPAC facility dispensed with the roof and walls altogether, protecting the power and network cables by running them underneath the containers or burying them under the concrete.
And unlike enterprise data centers, which have generators and even flywheels to keep the power on, the ITPACs aren’t connected to backup generators. The whole facility has multiple power sources, but if it loses power, the workload running on the ITPACs will automatically get switched over to other data centers. That’s not your typical failover either. Bakken calls it a “globally distributed geo-resilient system — it’s not a primary and a secondary, it’s a global mesh.”
The whole building is a container
The generation 5 facility Microsoft just finished building is three times the size of everything else on the Quincy site, and it goes back to buildings rather than stacks of containers, but they don’t look like a familiar data center. There’s no raised floor; just the same cement slab the ITPACs sit on. Tall racks arrive pre-populated with servers built to the Microsoft OCP design and roll out of the delivery truck and into position. They connect to a common signal backplane, “so they share cooling, network and power,” says Bakken. That gives Microsoft the flexibility to cope with different server types or different data center architectures.
An entire wall of fans at one side of the building blows air cooled by a closed loop water system using recycled water (or even rain water collected at the data center) that is cooled by the outside air rather than a chiller.
Generators that supplement the power line from the dam run on methane recovered from waste water collected on site. Microsoft is also looking into thin film solar and even researching fuel cells powered by natural gas that it could put in the racks.
It’s a long way from the average server room, and it’s the kind of hyperscale cloud data center that only two or three cloud providers can put together. And, of course, it’s not the only data center Microsoft has.
Microsoft runs more than 100 data centers around the world to deliver its 200 cloud services and handle cloud workloads for over one billion customers and more than 20 million businesses. “Everything from Xbox to Office 365 and Azure”, boasts Bakken. And he has one more demanding customer. “The only place you can run a production workload at Microsoft is one of my data centers.”
This story, "Inside a hyperscale data center (how different is it?)" was originally published by CIO.