Computerworld - Along with the growth in the amount of information available, the Web has undergone a radical shift from being a platform for distributing information among IT workers to a general tool for exchanging communications and data throughout society. This, along with the fact that material on the Web changes daily and may quickly disappear, has triggered an awareness that much information on the Web deserves to be archived for future use, and numerous libraries have undertaken Web harvesting and archiving projects.
The biggest of all such projects is called the Wayback Machine, located at www.archive.org/web/web.php. It contains over 30 billion Web pages archived from 1996 onward. It's a terrific tool for looking into companies and organizations that may no longer exist, for seeing snapshots of Web pages long gone, a view of cyberspace in times past. The Wayback Machine doesn't have everything, but its scope is remarkable.
The U.S. Library of Congress has a program called Minerva (Mapping the Internet Electronic Resources Virtual Archive) aimed at collecting and preserving primary source materials created in digital formats (a.k.a. "born digital") that don't exist in any physical form. In the pilot program's first two years, the library has sponsored five event-based harvests of Web sites: Election 2000, Election 2002, Sept. 11, Sept. 11 Remembrance and Olympics 2002. The Minerva collection (www.loc.gov/minerva/) currently includes more than 35,000 Web sites consisting of more than 500 million Web pages.
- Editor's Note: The Future of Business Intelligence
- BI for the Masses
- Text Mining Tools Take on Unstructured Data
- Fraud Sniffers
- Web Harvesting
- Resources for More Information on Web Harvesting
- Web Harvesting and Libraries
- Doubtful BI
- Predictions for BI's Future
- Shark Tank: Tales of Business Un-Intelligence
- Four Steps to Get Your Data In Shape
- When Good Data Goes Bad
- Managing Data Madness
- Securing Business Intelligence
Read more about Business Intelligence/Analytics in Computerworld's Business Intelligence/Analytics Topic Center.
- Google I/O 2013's Coolest Products and Services
- 10 Star Trek Technologies That are Almost Here
- 19 Generations of Computer Programmers
- 25 Must-Have Technologies for SMBs
- A walking tour: 33 questions to ask about your company's security
- 15 social media scams
- The 7 elements of a successful security awareness program
- IT Certification Study Tips
- Register for this Computerworld Insider Study Tip guide and gain access to hundreds of premium content articles, cheat sheets, product reviews and more.
- IDC Security Infographic From the Era Before security to this current era of empowerment this infographic from Blue coat provides a timeline navigates the rise of...
- Case Study: Simplifying the Transition to Exchange 2010 with Email Management Solutions Read this case study to learn how a cloud-based email management solution greatly simplified the company's transition to Exchange 2010.
- Application Security eGuide In this eGuide, CIO and sister publications CSO and InfoWorld bring you news, opinions, research and advice regarding the risks that enterprises face...
- How Storage Resource Management Suite Meets Today's Storage Management Challenges This white paper outlines the common use cases Storage Resource Management Suite addresses including comprehensive monitoring, reporting, and analysis for heterogeneous block, file,...
- Live Webcast
Webinar: Create Competitive Advantage, Featuring Synchology - View Now!
- Webinar: Create Competitive Advantage, Featuring Synchology View Now!
- Software Asset Management - Program Considerations to Help Reduce Risk and Lower Costs SAM: A must have IT tool to help reduce costs and minimize business and legal risks. All Business Intelligence/Analytics White Papers | Webcasts