Internet Archive, after Trump win, looks to create alternate site in Canada

The Archive is home to more than 15 petabytes of historical electronic data from the Web and other sources

Fearing that decades of digital data could be threatened by a Donald Trump presidency, the nonprofit Internet Archive today proposed a secondary electronic library in Canada to safeguard its historical information.

"This year, we have set a new goal: to create a copy of Internet Archive’s digital collections in another country. We are building the Internet Archive of Canada because, to quote our friends at LOCKSS, 'lots of copies keep stuff safe,'" Internet Archive Founder Brewster Kahle said in a blog today. LOCKSS is an open-source, peer-to-peer network that allows libraries to collect and share Web-based data.

As an organization, the Internet Archive has been a proponent of a free and open Internet, which it believes may be in jeopardy.

Kahle said an Internet Archive of Canada would help keep its cultural materials safe, private and perpetually accessible.

"It means preparing for a Web that may face greater restrictions. It means serving patrons in a world in which government surveillance is not going away; indeed it looks like it will increase," Kahle wrote. "Throughout history, libraries have fought against terrible violations of privacy—where people have been rounded up simply for what they read. At the Internet Archive, we are fighting to protect our readers' privacy in the digital world."

The Internet Archive, which also houses the Wayback Machine web-page repository, is home to more than 15 petabytes (15 million gigabytes) of online data. It is asking the public for donations to build the Internet Archive of Canada, which it said will cost millions of dollars.

"On November 9th in America, we woke up to a new administration promising radical change. It was a firm reminder that institutions like ours, built for the long-term, need to design for change," Kahle stated. "For us, it means keeping our cultural materials safe, private and perpetually accessible. It means preparing for a Web that may face greater restrictions."

The Internet Archive's Wayback Machine, which went live in 2009, is a digital time capsule that stores more than 150 billion archived versions of Web pages - 750 million a week -- dating back to 1996.

Based in the Presidio in San Francisco, the Internet Archive and its Wayback Machine use an algorithm that repeats a Web crawl every two months in order add new Web page images its database. The algorithm first performs a broad crawl that starts with a few "seed sites," such as Yahoo's directory. After snapping a shot of the home page, it then moves to any referable pages within the site until there are no more pages to capture. If there are any links on those pages, the algorithm automatically opens them and archives that content as well.

The Internet Archive has data centers in Redwood City and Mountain View, Calif. and  not only keeps snapshots of Web pages, but also software, pictures, movies, audio clips and books.

The Internet Archive also works with about 100 physical libraries around the world whose curators help guide deep Internet crawls. The Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.

"For 20 years...we've backed you up. Now we ask for your help in return," Kahle stated. "The Web needs a memory, the ability to look back."


Copyright © 2016 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon