IT on the 'Outer Limits'

An effort to identify deep-space signals by linking thousands of PCs into a virtual, massively parallel computer could have wide-ranging implications for business.

When you've got a big job - like searching the universe for signs of intelligent life - you need all the help you can get. That was the idea behind the May 1999 launch of SETI@home, an imaginative application of distributed computing that could have far-reaching implications - for business.

SETI@home, a project supported by the nonprofit SETI Institute in Mountain View, Calif., and other groups, has harnessed the Internet - and people's imaginations - to organize almost 2 million volunteer PCs into a virtual massively parallel computer.

The task: analyzing radio signals picked up by the Arecibo radio telescope in Puerto Rico - the one featured in the 1997 movie Contact. The goal: detecting the kind of deep-space radio signals that could indicate communication by other intelligence in the universe. The strategy: to use as many of the world's computers as possible together to accomplish the goal.

"The Internet lets us do that for first time in the history of computers," says David Anderson, the SETI team's distributed computing guru. "It lets us, in effect, make them into one big parallel supercomputer."

Moreover, the SETI@home software runs in the background or as a PC screen saver, so it doesn't interfere with users' normal computing tasks.

The search for extraterrestrial intelligence (SETI) may or may not find ET, but it has helped spur a change in thinking about the potential for distributed computing. Proponents say that linking computers through the Internet could enable long-term, computation-intensive tasks in aerodynamics, pharmacology, geophysics, biotechnology and manufacturing to be done in relatively little time.

1pixclear.gif
1pixclear.gif
1pixclear.gif

How SETI@home Works

"The hypothesis is that if there are other intelligent beings, they probably use radio waves to communicate among themselves, in which case we would have some chance of hearing leakage of that communication in the same way our TV and radio waves leak out into space," says David Anderson, a computer scientist who works on the SETI@home project. "Or they might be sending an intentional signal with the express purpose of telling other beings like us that they're there," he adds.

Here's how the SETI scientists are harnessing the Internet to find such signals.

As the world's largest radio telescope at Arecibo, Puerto Rico, slowly sweeps the sky, digital data are recorded on special magnetic tapes (donated by Fuji Tape Co.) at the rate of about 50 gigabytes a day.

The tapes are mailed to the University of California at Berkeley, where the data are transferred to three Enterprise 450 servers, donated by Sun Microsystems.

The servers chop the data into "work units" of about a third of a megabyte each, which are stored on a set of 500-gigabyte disks.

Volunteers initially download the SETI client software through the Internet onto Windows or Mac PCs, and it acts like a screen saver, starting when they're not using their computer. Versions for Unix and Linux run in the background at low priority all the time. Volunteers then connect to the SETI server via the Internet and receive a work unit. The work unit can download in just a few minutes even via a 28.8 modem. Then they disconnect and the PC processes the data in its spare time. The processing is complex. It takes about 20 hours on a Pentium 400.

A work unit represents a strip of the sky about the width of the moon and one-tenth the height, containing a frequency band of 10,000 hertz. Radio waves in nature are spread over different frequencies and come across as noise rather than a discreet wave, so the computer combs through that band looking for narrow-frequency radio waves, like the transmission from a commercial radio station.at 90.1 megahertz FM, for example. Such waves may indicate a transmission.

When the work unit is finished, the software produces a short list of narrow-frequency candidate signals. Then it reconnects to the SETI server and exchanges the work unit and results for another work unit.

Because our civilization is constantly leaking "radio garbage" into space, its difficult to say whether a detected signal comes from us or them. The candidate signals are put into a database where they can be examined and compared, but the best way to determine whether a signal is from space is to look for the same signal from the same point in the sky at two different times. SETI@home is just getting to the stage where it will have results of multiple runs through the sky that will enable that kind of analysis.

1pixclear.gif

Using the Internet as a massively parallel computer suddenly makes goals that were once tabled because they were deemed impractical possible, Anderson says. "There may be some analysis you want to do, and you see it will take 100,000 years of computer time, so you would throw away that idea," he explains. But in one year, SETI@home has used more computer time than that. "So those ideas can be taken out of (the) wastebasket and reconsidered," he says.

Potential users include energy companies that need to do seismic or geographic analyses before they start drilling for oil or digging for coal, manufacturers that do structural analysis or study fluid dynamics prior to transforming a design from a computer model into the real equipment, and engineering firms that stress-test everything from bridges to aircraft.

The basic idea is simple, says Dave McNett: "It's all based on not wasting the resource - running distributed software on your machine and letting it use whatever resources you aren't using."

McNett is president of Distributed.net, a Birmingham, Ala.-based nonprofit research foundation founded in 1997 to compete in an encryption-breaking contest. The group has grown to 20 developers and has rallied a 190,000-machine network (93% are PCs) to break code and solve mathematical puzzles for fun and prizes.

These kinds of networks can accomplish a great deal, McNett says, because 90% of most computers' processing power goes unused. "During the day, most PCs spend most of their time flying tiny toasters around," he says. Even when computers are in use, the majority of tasks aren't CPU-intensive. Working in a spreadsheet, for example, is CPU-intensive only when the columns are computed. "CPUs are used only in short bursts," McNett says. "And that's not even mentioning 6 p.m. to 9 a.m. and weekends and holidays."

Application Limits

Massively parallel computing "does make sense for use in the oil industry, and we have used the technique (internally) for some of our computationally intensive problems," says John M. Old, director of information management for worldwide exploration and production at Texaco Inc. in Houston.

But distributed computing isn't for every job. "The SETI project lends itself to breaking the data into small, independent chunks, which makes the parallel computing fairly simple," Old explains. Unfortunately, not all data can be segmented that way, and many projects require complex communication among processors.

McNett acknowledges that there are plenty of things an IBM RS/6000 can do that a distributed network can't. "We can't do anything that's more data-intensive than CPU-intensive," he explains. For example, weather prediction is difficult because the data is very interrelated. Distributed computing is better at jobs such as animation rendering, in which each of the 30 frames per second that go into a movie like Toy Story are separate tasks that can be distributed among thousands of computers.

With those kinds of jobs in mind, the folks at Distributed.net are considering a commercial spin-off. At present, Distributed.net's machines are equivalent to 42 144-node RS/6000s, the fastest computers on the market, at a net cost of about $120 million (based on the floating-point speed of the RS/6000 and the Pentium II/266 PC, the average computer on the distributed network). "We're proud of that," McNett says, "but the potential number of machines dwarfs what we have now."

If the SETI project rallied 2 million computers by word of mouth, imagine what a company that was willing to pay for your PC's time might accomplish. That's exactly what Jim Albea, chief operating officer at ProcessTree Network in Madison, Ala., was thinking in January when he set up a Web site soliciting computers for the April launch of what he claims is the first commercial venture in the field (www.processtree.com).

But despite the potential, there are problems that have to be solved before massively parallel Internet computing can work commercially, McNett says. The biggest hurdle is security. An oil exploration company considering the mineral rights to some land might gain a lot of efficiency by divvying up the analysis of the geologic data across the Internet. But what's to stop a competitor from setting up machines in the network and gleaning some insights from the data?

And what about would-be saboteurs in the network, bent on ruining a project for competitive or malicious reasons? "There has to be a security model that is very easy, that doesn't allow a client machine to gain more insight than it should on the nature of a task and that can assure that no one client machine has enough grasp of the project that it can adversely affect the result," McNett says.

Another concern is that if people can modify the software's behavior, they can affect the project's integrity. SETI@home ran into this problem when some volunteers tweaked the software to improve its speed. Despite the users' good intentions, SETI scientists had to throw out the resulting radio-wave analyses because they couldn't vouch for their accuracy.

Finally, McNett says, massively distributed computing calls for a business model that has yet to gel. "Are you going to send 18-cent checks to 100,000 people every month?" he asks.

Albea says he thinks ProcessTree has solved most of the technical and business problems. For security, he plans to combine encryption with pieces of data so small that they would yield no useful information even if they were decoded. It may also randomly duplicate jobs and check for identical results. A discrepancy would indicate an error or sabotage.

Despite these precautions, Albea says security concerns will probably initially scare off some potential customers. He also notes that computer owners may have concerns of their own, but he points to SETI's ability to overcome user misgivings. "It gets down to trusting that we're a viable business with no interest in rifling their files," he says.

Meanwhile, even though ProcessTree hasn't yet set a pricing plan, CEO Steve Porter offers a ballpark figure of about $1,000 for the equivalent of a year's worth of CPU power from a Pentium II/400.

The company may pay in the range of $10 to $20 per month per computer - and even more for large-volume volunteers such as businesses. Payment will likely be in credits with an online retailer or service. For example, a participant might get discounts on his Internet service in exchange for running the software. "They're not going to be able to retire on this," Albea says, "but it's a resource just doing nothing, and instead they can be getting credits."

Since its site debuted in January - with virtually no advertising - ProcessTree has lined up more than 35,000 users representing more than 70,000 machines. "We are the largest body of available commercial computing power in the world right now," Porter says. "You can't get anything that can go faster than we can, and we get faster every day."

More online

Copyright © 2000 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon