Skip the navigation

System, Cure Thyself

Self-healing software and hardware are on the way.

January 12, 2004 12:00 PM ET

Computerworld - Some years ago at an insurance company, a server's file-locking process kept failing, and the vendor couldn't produce a patch to prevent it from happening. As a result, a file could be accessed by more than one user at a time.
The company's IT administrator ultimately wrote a custom script that simply restarted the locking process every time it failed, about every 10 minutes. "It was better than having several hundred users mad at me," recalls the administrator, Nick van der Zweep, now the director of virtualization and utility computing at Hewlett-Packard Co.
Van der Zweep's custom code was an early example of self-healing software, a general category that has earned the attention of researchers and vendors such as HP, IBM and Computer Associates International Inc. But many other companies are actively researching and developing self-healing capabilities for their products.
For example, there are already products on the market that automatically correct, or self-heal, components or subsystems such as servers that have reached capacity. (In that case, a program can add more servers or more blades automatically.) But the focus over the next two to five years will be on developing entire networks and systems that self-heal across combinations of applications, storage and computing resources, say analysts and researchers.
Self-Aware Computing
Definitions of self-healing vary widely. "Self-healing ... connotes that when there are problems in the infrastructure, the infrastructure copes with them," says Alan Ganek, vice president of autonomic computing at IBM.
For example, Ganek says, when IBM ran the Web site for the U.S. Open tennis tournament in September, software handled workload spikes by delivering computing power from a new server to keep service levels high.
"Self-healing is the capability of any piece of technology to monitor itself and self-diagnose a problem, and then to start a solution that either bypasses or corrects the problem," says Jean-Pierre Garbani, an analyst at Forrester Research Inc. in Cambridge, Mass. For example, HP has products that can detect when a processor is going to fail by noticing single-bit memory errors in the cache, so that they can automatically turn on another HP processor at a customer site.
With those examples in mind, it is clear that self-healing can mean utility computing (as in marshaling resources when needed) as well as autonomic computing (as in correcting an underlying system problem when it occurs).
Richard Ptak, an analyst at Ptak, Noel & Associates in Boston, says that there is a "great deal of confusion" about the term but that to be truly self-healing, a system must perform four functions: self-monitoring, self-analysis, planning and execution. Today, systems implement the four stages "with varying degrees of sophistication," he says.
The real challenge will be to drive implementation of all four steps to the lowest levels—to the level of devices and circuit elements, Ptak says. He predicts that within two years, manufacturers will have produced self-healing chips, which will support self-healing devices within four years, followed in perhaps another year by organic circuits, which will adapt themselves in order to correct deficiencies or failures. Manufacturers are likely to form partnerships in coming months to unite the four phases of self-healing, adds Jasmine Noel, Ptak's partner.
Meanwhile, start-up Vieo Inc. in Austin is trying to develop a single device that handles all four functions together, to replace a series of devices built by different companies, she says. Garbani notes that Intel Corp. may dominate the server processor market in five years, which could result in low-cost self-healing chips for servers.
In the Network
Researchers at the Georgia Institute of Technology are working with IBM-donated gear to develop self-healing systems for corporate settings. They are exploring how systems can respond to outages and other events more quickly than they can today, says Karsten Schwan, director of the university's Center for Experimental Research in Computer Systems.
One area of the research will be to find ways, perhaps through "network-aware middleware," to have systems self-heal across network layers, from Layer 1, the physical layer, to Layer 7, the application layer, Schwan says.
For example, TCP today slows the sending of packets at lower network layers, especially when they include rich multimedia content. "But this may not be in the interest of the servers running atop TCP," he says. With appropriate middleware, the application server could decide to take steps to affect the transmission, such as compressing the multimedia content more or marshaling more CPU resources, or maybe even sending a thumbnail of a picture instead of the full picture, Schwan says.
As an indication of the interest in self-healing systems, the Defense Advanced Research Projects Agency is evaluating proposals to support research and testing for its Self-Regenerative Systems program. "Network-centric warfare demands robust systems that can respond automatically and dynamically to both accidental and deliberate faults," DARPA has pointed out in its solicitation for bids.

Self-Healing Processes
Self-Healing ProcessesPlease click on image above to view a readable version.

Source: Ptak, Noel & Associates, Boston

See more Future Watch articles.

Read more about Applications in Computerworld's Applications Topic Center.

Our Commenting Policies