System, Cure Thyself
Self-healing software and hardware are on the way.
Computerworld - Some years ago at an insurance company, a server's file-locking process kept failing, and the vendor couldn't produce a patch to prevent it from happening. As a result, a file could be accessed by more than one user at a time.
The company's IT administrator ultimately wrote a custom script that simply restarted the locking process every time it failed, about every 10 minutes. "It was better than having several hundred users mad at me," recalls the administrator, Nick van der Zweep, now the director of virtualization and utility computing at Hewlett-Packard Co.
Van der Zweep's custom code was an early example of self-healing software, a general category that has earned the attention of researchers and vendors such as HP, IBM and Computer Associates International Inc. But many other companies are actively researching and developing self-healing capabilities for their products.
For example, there are already products on the market that automatically correct, or self-heal, components or subsystems such as servers that have reached capacity. (In that case, a program can add more servers or more blades automatically.) But the focus over the next two to five years will be on developing entire networks and systems that self-heal across combinations of applications, storage and computing resources, say analysts and researchers.
Definitions of self-healing vary widely. "Self-healing ... connotes that when there are problems in the infrastructure, the infrastructure copes with them," says Alan Ganek, vice president of autonomic computing at IBM.
For example, Ganek says, when IBM ran the Web site for the U.S. Open tennis tournament in September, software handled workload spikes by delivering computing power from a new server to keep service levels high.
"Self-healing is the capability of any piece of technology to monitor itself and self-diagnose a problem, and then to start a solution that either bypasses or corrects the problem," says Jean-Pierre Garbani, an analyst at Forrester Research Inc. in Cambridge, Mass. For example, HP has products that can detect when a processor is going to fail by noticing single-bit memory errors in the cache, so that they can automatically turn on another HP processor at a customer site.
With those examples in mind, it is clear that self-healing can mean utility computing (as in marshaling resources when needed) as well as autonomic computing (as in correcting an underlying system problem when it occurs).
Richard Ptak, an analyst at Ptak, Noel & Associates in Boston, says that there is a "great deal of confusion" about the term but that to be truly self-healing, a system must perform four functions: self-monitoring, self-analysis, planning and execution. Today, systems implement the four stages "with varying degrees of sophistication," he says.
The real challenge will be to drive implementation of all four steps to the lowest levelsto the level of devices and circuit elements, Ptak says. He predicts that within two years, manufacturers will have produced self-healing chips, which will support self-healing devices within four years, followed in perhaps another year by organic circuits, which will adapt themselves in order to correct deficiencies or failures. Manufacturers are likely to form partnerships in coming months to unite the four phases of self-healing, adds Jasmine Noel, Ptak's partner.
Meanwhile, start-up Vieo Inc. in Austin is trying to develop a single device that handles all four functions together, to replace a series of devices built by different companies, she says. Garbani notes that Intel Corp. may dominate the server processor market in five years, which could result in low-cost self-healing chips for servers.
In the Network
Researchers at the Georgia Institute of Technology are working with IBM-donated gear to develop self-healing systems for corporate settings. They are exploring how systems can respond to outages and other events more quickly than they can today, says Karsten Schwan, director of the university's Center for Experimental Research in Computer Systems.
One area of the research will be to find ways, perhaps through "network-aware middleware," to have systems self-heal across network layers, from Layer 1, the physical layer, to Layer 7, the application layer, Schwan says.
For example, TCP today slows the sending of packets at lower network layers, especially when they include rich multimedia content. "But this may not be in the interest of the servers running atop TCP," he says. With appropriate middleware, the application server could decide to take steps to affect the transmission, such as compressing the multimedia content more or marshaling more CPU resources, or maybe even sending a thumbnail of a picture instead of the full picture, Schwan says.
As an indication of the interest in self-healing systems, the Defense Advanced Research Projects Agency is evaluating proposals to support research and testing for its Self-Regenerative Systems program. "Network-centric warfare demands robust systems that can respond automatically and dynamically to both accidental and deliberate faults," DARPA has pointed out in its solicitation for bids.
Please click on image above to view a readable version.
Source: Ptak, Noel & Associates, Boston
See more Future Watch articles.
Read more about Applications in Computerworld's Applications Topic Center.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- What Datapipe customers need to know about the new PCI DSS 3.0 compliance standard This handy quick reference outlines what PCI DSS 3.0 is, who needs to be compliant and how Alert Logic solutions address the new...
- The 12 PCI DSS 3.0 requirements addressed by Peer 1 Hosting This handy quick reference outlines the 12 PCI DSS 3.0 requirements, who needs to be compliant and how Alert Logic solutions address the...
- Defense Throughout the Vulnerability Life Cycle This whitepaper provides insight into how to leverage threat and log management technologies to protect your IT assets throughout their vulnerability life cycle.
- The Critical Role of Support in Your Enterprise Mobility Management Strategy Most business leaders underestimate the importance of tech support when they choose an EMM solution. Here's what to put on your checklist.
- Live Webcast Best Practices for the Hyperconverged Enterprise Network To the Age of Constant Connectivity and Information overload
- Live Webcast Unmasking the Differences between Consumer and Enterprise File Sync & Share The consumerization of IT combined with the rapid pace of the modern mobile workplace is forcing enterprise IT teams to evaluate file sync...
- Live Webcast Government Agency Webifies Outdated COBOL Applications Let this CTO tell you how his agency converted 1980s-era green screens into an e-filing portal for the 100,000 cases handled each year...
- The New Way to Work Knowledge Vault This Knowledge Vault focuses on how, in today's increasingly virtual world, it's more important than ever to engage deeply with employees, suppliers, partners,...
- Getting Ready for BlackBerry Enterprise Service 10.2 Find out how BlackBerry® Enterprise Service 10 helps organizations address the full spectrum of EMM challenges, while balancing the needs of both the... All Applications White Papers | Webcasts