Google offers tips on reducing latency on large scale systems
In the latest ACM magazine, Google fellows offer a few secrets to keeping Web systems responding to users as quickly as possible
IDG News Service - Running the world's most popular website, Google engineers know a thing or two about keeping a site responsive under very high demand. In the latest issue of the ACM (Association for Computer Machinery) monthly magazine, Google reveals a few secrets to maintaining speedy operations on large-scale systems.
Systems as large as Google's can suffer from even a few sluggish individual nodes, write the article's authors, Jeffrey Dean, a Google fellow in the company's systems infrastructure group, and Luiz AndrA(c) Barroso, a Google fellow who is technical lead of Google's core computing infrastructure. The good news is that while slow nodes can never be eliminated entirely, a system can be designed to still offer speedy service to the user, the authors wrote.
"It's an important topic. When you have a [user] request that needs to gather information from many machines, inherently some of the machines will be slow," said Ion Stoica, an ACM reviewer who is a computer science professor at the University of California Berkeley, as well a co-founder of video stream optimization software provider Conviva.
"As [Internet services] try to reduce the response times more and more, the problem will become more difficult because [the systems] will have less time to decide what to do when something goes wrong. So it will be an area of research and development that will get attention over the next few years," he said.
Looking at performance variability is particularly important with large distribution systems such as Google's, because performance troubles on even a single node can result in delays that affect many users. "Variability in the latency distribution of individual components is magnified at the service level," the authors wrote.
For instance, consider a server that typically responds to a request within 10 milliseconds but takes an entire second to fulfill a request every 100th time. In a single server environment, this means that only every 100th user would get a slow response. But if each user request is handled by 100 servers -- each with the same latency characteristics -- then 63 out of every 100 users would get a slow response, the authors calculated.
Performance variability can take place for a number of reasons, the authors note. Sharing resources, such as running multiple application on a single server, can affect the response time of each application. The length of a component's work queue may also have a factor, as would routine maintenance jobs that can take up resources.
The Google engineers offered a number of techniques for mitigating slow performance from individual nodes, such as breaking jobs into smaller components and better managing routine maintenance tasks.
- 12 iPhones Apps That Will Make You a Networking Star
- 10 Careers Robots Are Taking From You
- Big Data Gold Isn't Always Where You Would Expect It
- 6 Tips to Build Your Social Media Strategy
- A walking tour: 33 questions to ask about your company's security
- 15 social media scams
- The 7 elements of a successful security awareness program
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- Deploying Flash in the Enterprise Flash is quickly emerging as the preferred way to overcome the nagging performance limitations of hard disk drives.
- FTP vs MFT: Why It's Time to Make the Change Get the facts you need to make the case for managed file transfer. Read the report to get head-to-head comparisons of cost, reliability,...
- ESG Lab Validation Report Preview - QLogic FabricCache QLE10000 Adapter This ESG Lab preview summarizes the results of independent, third-party testing of QLogic's 10000 Series 8Gb Fibre Channel Adapter.
- QLE10000 Series Adapter Provides Application Benefits Through I/O Applications that are Web 2.0, mission-critical, I/O intensive, virtualized, and clustered continue to put an additional burden on processors and slower storage, which...
- Lenovo & Windows 8 Innovative Devices Podcast Learn about the innovated devices that Lenovo designed to take full advantage of the new touch interface of Microsoft's Windows 8 Pro.
- Technology Support Solutions case study - Calvary Chapel Learn how Calvary Chapel leverages technology to support the church's mission and educational programs, with the help of PC Connection and Lenovo. All Hardware White Papers | Webcasts
Our weekly newsletter will cover a wide range of topics and trends related to consumerization. Stay up to date with news, reviews and in-depth coverage of BYOD, smartphones, tablets, MDM, cloud, social and how consumerization affects IT. Subscribe now!