Skip the navigation

Blogs Subject to Selective Service for Digital Library

June 11, 2007 12:00 PM ET

Computerworld -

WASHINGTON -- Blogs are being started and abandoned at volcanic rates. Nonetheless, bloggers are creating a massive chronicle of daily life, filled with stories and, of course, rants.

> It’s a potentially important record of our time for future generations — one that the Library of Congress is interested in preserving. But as with other forms of digital data, the Washington-based library can’t hope — and, really, doesn’t want — to save all of the content being published in blogs, according to Laura Campbell, associate librarian for strategic initiatives.

> Campbell, who received the 2007 EMC Information Leadership Award at last week’s Computerworld Honors Program ceremony, is also the director of the National Digital Library Program. Through that and other programs, the Library of Congress is working to collect and preserve so-called born-digital data that originates on the Web and to digitize other information for online access, particularly in educational settings.

Laura Campbell
Laura Campbell

Photo by Asa Mathat

The library currently is managing about 295TB of digital data, not all of it taken off of the Internet. Campbell, who is in charge of developing strategies for preserving the electronic data, said the library and its partners, including 12 other national libraries, are selective about what they choose to capture and store. “I don’t think you would want to save most of what’s produced,” she said.

> Personal blogs are part of the digital data mix in the library’s collection. Campbell described their inclusion as a continuation of earlier data-collection efforts predating the Internet era. “We have the story of the common person at any time in history,” she said, adding that the library is also collecting podcasts and information posted on social networking Web sites.

> But there are limits to how much online data can be archived. “We are doing a sampling and go get blogs on certain subject areas that we have chosen and selected,” Campbell said. “It won’t be everything by any stretch.”

> Campbell said the library has worked with its partners to develop software tools that can help automate the process of collecting material from the Internet.

But as the collections work proceeds, improvements in the process are ongoing as well. “We’re learning by doing,” said Campbell, who described the current approach used by library workers as an iterative process of continuing assessments and adjustments. 

Read more about Management in Computerworld's Management Topic Center.

Our Commenting Policies