The computer, once a tool for scientists, is becoming a collaborator

It's not just a tool serving science anymore. It's becoming a part of the science.

1 2 3 4 Page 4
Page 4 of 4

Mountains of Data

Roger Barga is a researcher at MSR who is developing tools for e-science, which he calls "in silico science -- science done inside the computer." He says two technological developments are driving e-science. "The first is that our ability to capture data -- through bigger machines, bigger colliders, more sensors and so on -- is outpacing our ability to analyze it by conventional means."

The second is the emergence of new tools for pattern recognition and machine learning -- algorithms that improve over time as they deal with more and more data, without human programming -- and other new ways to organize, access and mine vast amounts of data. For the Neptune ocean observatory, MSR is building a "scientific workflow workbench" on top of Microsoft Windows Workflow, to save, systematize and catalog all the data. It will help scientists visualize oceanographic data in real time and compose and conduct experiments.

The workbench work recognizes that it isn't enough to just analyze data. When data is distributed, complex and voluminous, just getting organized and keeping track of progress is a daunting job for the scientist. The days of microscope, pencil and notebook research are long gone.

Barga says e-science will profoundly affect the practice of science. "Scientists will have to ask themselves if they are theoreticians or bench scientists or one of these new computational scientists in their area. You'll see the branding of a new kind of scientist."

The availability of petabytes of data from the Internet will transform the practice of sciences involving human behavior, Kleinberg says. "For millennia, social interaction has been transient, ephemeral and essentially invisible to the standard techniques of scientific measurement," he says. "It's hard to go around measuring people's friendships and conversations, or why they make decisions. But now we have these digital trails that were never available before. Google is not just looking for simple correlations; all that data is being passed through very sophisticated probability models."

He says the vast data stores and analytical techniques now available mean that scientists no longer have to formulate detailed theories and models and then test them on experimental data. Sketchy ideas can be tried against the data, with the data and tools fleshing out the model, in effect collaborating with the researcher to develop a theory. "The mass of data lets you fill in the details whose broad outline you have created," he says. "Then you run massive amounts of data through it and discover that in the specifics, certain things matter much more than we thought and certain things much less."

Jeremy Gunawardena, director of the Virtual Cell Program at Harvard Medical School, says an emerging model of the cell likens it to a computer -- with inputs and outputs and logical decision-making processes.

"A number of biologists with significant stature in the field really feel this is the new way forward for biology," he says. "But we are still in the very early days."

This version of this article originally appeared in Computerworld's print edition.


Copyright © 2008 IDG Communications, Inc.

1 2 3 4 Page 4
Page 4 of 4
Shop Tech Products at Amazon