Skip the navigation

Digg Dips Deep Into Open Source

By Eric Lai
April 30, 2007 12:00 PM ET

Computerworld - SANTA CLARA, Calif. -- Information technology staffers at Digg Inc. credit two particular features of the companys LAMP-based server cluster for helping its Digg.com news aggregation Web site maintain speedy performance in the face of rapid usage increases.

The site, which lets visitors vote on  or digg  their favorite news stories hosted on other sites, recently passed the 1.2 million user mark, according to Elliot White III, an engineer at Digg who spoke at MySQLs user conference here last week.

Digg has about 100 servers that run a combination of Linux, the Apache Web server, the MySQL database and the PHP scripting language  all open-source technologies that are collectively referred to as LAMP. The systems, which are scattered in multiple data centers, include about 20 database servers, 30 Web servers and a few search servers running the open-source Lucene search engine. The rest of the systems operate as backup machines.

In Diggs architecture, a load balancer sends queries to PHP servers, MySQL slave servers that feed data to the PHP servers, and a MySQL master server that feeds data to the slaves. Thats a fairly standard setup. But White said that to get away from sending raw queries against the database, the San Francisco-based company uses open-source memory caching software called Memcached.

First developed for use by LiveJournal Inc.s online journaling Web site, Memcached stores chunks of data that can be pulled out and used to dynamically create a Web page. Conventional caching technologies, which store entire Web pages, would be too slow and inefficient for a site that changes continuously like Digg.-com, White said.

The other atypical feature of Diggs setup is its use of what engineer Tim Ellis called sharding  a term apparently coined by developers at Google Inc. Sharding involves breaking a database into smaller parts to improve performance by isolating heavy workloads.

If 90% of your data is within a certain range and you can get that part working really fast, you can help customers, Ellis said. Then its OK if the remaining 10% is slower.

A database can be sharded by table, date or range. The process is similar to partitioning but with some key differences, Ellis said. For example, sharding usually involves divvying up data onto different physical machines, but partitioning is typically done on the same piece of hardware.

Breaking a database into several smaller pieces can mean more work because of the inability to use common SQL commands, such as table joins, Ellis noted. Developers dont like this crazy stuff, he said.

Digg is really lucky in that 98% of the time, users are reading data rather than writing it to the server, Ellis noted. Most people come to Diggs front page, read it and leave, which is kind of nice, he said, drawing laughs from the audience.

Read more about Data Center in Computerworld's Data Center Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Data Center White Papers
Optimize Data Backup to Ensure Data Protection
Protecting data, a top IT priority, is made even more difficult as rapid data growth pushes traditional backup processes beyond their capabilities. Integrating...
Enabling Storage Flexibility to Better Manage Data Growth
Virtualizing file storage gives organizations the flexibility and data mobility required to reduce backup windows and costs, improve storage efficiency, and seamlessly integrate...
Case Study: Publisher Cuts Backup Times by 98 Percent
Learn how John Wiley & Sons, Inc., a leading publisher for scientific, technical, and medical communities, successfully reduced backup times from 36 hours...
Case Study: Firm Optimizes Storage, Shrinks Backup Window
By optimizing its existing storage environment, multi-skilled architectural firm RHWL reduced backup times from 14 hours to 1.5 hours, slashed tape and offsite...
Indiana University Virtualizes Mission-Critical Oracle Databases
The Kelley School of Business at Indiana University deployed VMware Infrastructure which decreases costs, streamlines server deployment, and reduces energy consumption.
All Data Center White Papers
Data Center Webcasts
Live Webcast
North Pole to South Seas: Overcoming the Pitfalls of remote Performance
In today's always-on world, connectivity is a business requirement. You need the tools that allow you to operate as if you were on...
Live Webcast
Playing Defense: Staying on Top of Your Disaster Recovery Game
When it comes to disaster recovery, rapidly growing data volumes, distributed computing models, and new technologies all combine to present an ever-changing playing...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
Introduction to VMware View 5
VMware View™ 5 simplifies IT management while increasing end user freedom by delivering desktop services from your cloud. Building upon VMware's leadership in...
Reliable Disaster Protection with VMware vCenter Site Recovery Manager
A simple, cost-effective disaster-recovery solution for virtual environments is high on the agenda for IT organizations as they virtualize more business-critical applications with...
Introduction to VMware vCenter Site Recovery Manager 5
Traditional disaster recovery solutions are often too expensive, complex and unreliable to meet business requirements. As a result, IT departments are hesitant to...
Introduction to Virtualization
This video webcast is designed to help those with little to no virtualization experience understand why virtualization and VMware are so important to...
All Data Center Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs