Ads by TechWords

See your link here
Receive the latest technology news and information.
Storage
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

Internet Archive to unveil massive Wayback Machine data center

The Wayback Machine stores 85 billion Web pages dating back to '96

March 19, 2009 12:00 PM ET

Computerworld - The Internet Archive organization plans next week to announce the opening of a new data center to house two petabytes of information for its Wayback Machine, the digital time capsule that stores archived versions of Web pages dating back to 1996.

For example, this is what Computerworld's Web site looked like in 1997, what Google looked like in 1998 and what CNN looked like in 2000.

The Wayback Machine houses 85 billion Web pages archived for more than a dozen years, which amounts to three petabytes of data, or about 150 times the content of the Library of Congress. Only five years ago, the Wayback Machine contained about 30 billion Web pages. It is expected to continue to grow by 100TB of data per month now that it's live.

The Internet Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.

According to an event invitation from Sun Microsystems Inc., the Internet Archive is moving from a traditional data center filled with standard Linux servers to one that runs Solaris 10 with ZFS on Sun Fire x4500s servers inside a Sun Modular Datacenter. The modular system is an all-in-one data center housed in a metal shipping container for mobility.

Because of the modular design, Sun said the data center was deployed in a tenth of the time it would take to build a typical bricks-and-mortar data center. The Wayback Machine Sun Modular Datacenter can service 500 inquiries a second, Sun said. A spokesperson for the Internet Archive said the user interface on the Wayback Machine will not change.

The Internet Archive is a nonprofit organization located in the Presidio in San Francisco, with data centers in Redwood City and Mountain View, Calif. The archive not only keeps snapshots of Web pages, but also software, movies, books, and audio clips.

Users can surf the Wayback Machine by typing in the Web address of a Web site or Web page and then choose from a series of dates that reflect the stored images. The site does not currently support keyword search.



Jump to comments

Internet Archive

Additional Resources

WHITE PAPER
Approximately 60 percent of data migration projects overrun time or budget, while some fail completely. Download this white paper, "Enhancing Your Chance for Successful Data Migration," to learn the critical steps you need to take to execute a data migration project with minimum cost and risk to your business.
WHITE PAPER
Read the Gartner research note to learn why the TCO of a server-based computing deployment used to deliver all applications to users is around 50% lower than that of an unmanaged desktop deployment.
WHITE PAPER
Economic downturns have a tendency to accelerate emerging technologies, boost the adoption of effective solutions, and punish solutions that are not cost competitive or that are out of synch with industry trends. This IDC White Paper presents the results of an IDC survey of 330 companies in Western Europe, Asia/Pacific and the Americas that measures the receptiveness to Linux and takes into consideration changing views driven by the disruptive economic environment that businesses face today.

What People Are Saying

White Papers & Webcasts

Data Protection is not an insurance policy -you cannot buy-back lost data
Find out why you need to maintain access to critical information to run your business and remain competitive.

Strategic ECM Webinar
Learn what new strategic business benefits can be realized through ECM!

5 Architecture Issues that Impact BES performance
Register to attend this LIVE Webinar to learn 5 Architecture Issues that Impact BES performance!

The Power/Density Paradox: The Result of High Density without Power Efficiency
Download this brief to explore what the power/density paradox is and how IT professionals can mitigate the risk.  

Four Principles for Reducing Storage TCO
View cost reduction strategies in this video! Provided by Hitachi Data Systems.