Skip the navigation
News

How Digg.com uses the LAMP stack to scale upward

Caching and 'sharding' data speeds up the social media Web site

By Eric Lai
April 24, 2007 12:00 PM ET

Computerworld - Digg.com credits two particular features of its LAMP (Linux, Apache, MySQL and PHP) server cluster for helping the news aggregation site maintain speedy performance in the face of high growth.

The site, which lets its users vote on, or "digg," their favorite news stories hosted on other sites, recently passed the 1.2 million-user mark according to Elliot White III, an engineer at San Francisco-based Digg Inc. He spoke at MySQL’s annual conference in Santa Clara, Calif. on Tuesday.

Today, Digg.com boasts 100 servers scattered in multiple data centers that host a total of 30GB of data, but the site started off in late 2004 as a single Linux server running Apache 1.3, PHP 4, and MySQL 4.0 using the default MyISAM storage engine, White said.

As more users dug Digg, the site moved to an architecture that uses a load balancer in the front that sends queries to PHP servers, MySQL slave servers that feed the PHP servers, and a MySQL master server that feeds data to the slaves.

That's a fairly standard setup. But to get away from "sending raw queries against the database," White said Digg.com uses a software called Memcached. First developed for use by the Livejournal site, Memcached is tailored for dynamic sites like Digg.com, which serve Web pages with content that is constantly changing and is personalized according to user preferences, White said.

Memcached stores chunks of data that can be pulled and used to dynamically create a Web page. Conventional caching systems, which store whole Web pages, would be too slow and inefficient for a site like Digg.

The other atypical feature of Digg’s setup is its use of what Tim Ellis, another Digg engineer, calls "sharding." 

A term apparently coined by Google engineers, sharding involves breaking a database into smaller parts in order to isolate heavy loads for better performance.

"If 90% of your data is within a certain range, and you can get that part working really fast, then you can help customers," Ellis said. "Then it’s OK if the remaining 10% is slower."

A database can be sharded by table, date or range. It is similar to partitioning, says Ellis, but with several key differences. Sharding usually involves divvying up data onto different physical machines. Partitioning, in contrast, typically occurs on the same piece of hardware. And while MySQL does not natively allow sharding, it does support partitioned tables, federated tables and clusters.

Digg only recently began sharding. While sharding is helping Digg.com achieve much faster performance overall, breaking a database into several smaller ones increases complexity, Ellis said. That can mean more work for developers and database administrators, because of the inability to use common SQL commands such as joining tables. "Developers don’t like this crazy stuff. That can create pushback," he said.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Applications White Papers
Forrester Total Economic Impact (TEI) Case Study - Oracle
In this paper, Forrester Consulting examines the total economic impact and potential return on investment (ROI) realized by three Enterprise organizations as they...
The Hidden Truth About Virtualizing Business-Critical Applications
This IDG whitepaper highlights key findings based on the Quickpoll Survey conducted with more than 300 Enterprise and Commercial IT decision makers worldwide...
Top 10 Myths About Virtualizing Business-Critical Applications
Even though virtualization has brought positive change to enterprise IT over the last decade, some skepticism remains about how valuable virtualization can be...
Enterprise Java Applications on VMware: Unix to Linux Migration Guide
This guide focuses on key considerations for IT Architects who are in the process of migrating Java applications from UNIX to Linux as...
Virtualizing Tier 1 Applications: A Critical Step on the Journey Toward the Private Cloud  
This IDC white paper explains how much of the Enterprise IT community is at a crossroads in extending their journey to the private...
All Applications White Papers
Applications Webcasts
Live Webcast
Banish Poor Application Performance: Eliminate Business Disruptions, Increase End User Productivity
End User Experience, 30-Min Webinar
Wed. Feb. 22nd ~ 11 AM ET

Are you ready to gain the proactive ability to rapidly respond...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Virtualize Business-Critical Applications with Confidence
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere®...
Discover the Benefits of Virtualization for Federal Applications
Want to say goodbye to missed SLAs? VMware can help you virtualize mission-critical applications such as Oracle, MS Exchange and SharePoint to achieve...
Reduce Application Lifecycle Management Costs with VMware ThinApp
Traditional desktop application deployment and management is a time-consuming and costly endeavor for IT. From development to deployment, including help desk support, the...
All Applications Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs