In this era of high-tech medicine, computers can be found everywhere from the medication trolley to the operating room, but busy practitioners are often not in a position to use keyboards and mice. Sometimes the better option is to use a sophisticated interface that has evolved over millions of years — the human voice.
“A lot of environments in the hospital are hands-free, and some are eyes-free,” says Dr. Redmond Burke, chief of cardiovascular surgery at Miami Children’s Hospital. “When I am looking at a baby’s heart, I can’t look up at the monitor or enter data into the patient database.”
Dr. Burke and the department’s head of IT, Jeffrey White, are working with IBM and Teges Corp. in Coral Gables, Fla., to develop a voice-based operating-room interface to the hospital’s iRounds patient database from Teges.
A surgeon using Smith & Nephew's voice interface system to control operating room equipment.At the start of a surgery, the com-puter reads out the patient’s name, diagnosis and the procedure to be performed. The system then informs the surgical staff when it’s time to perform actions such as giving the patient the next dose of anesthetic, reducing the chance of errors.
The doctor can also dictate information into the patient’s medical record during surgery and link a verbal description with a photo taken during the procedure, rather than have to remember the details later. This voice data is synchronized with all the feeds coming from the patient monitoring and photographic equipment, providing a more exact record of what occurred during the operation.
“This system meshes perfectly with our goal of constantly looking for technology that would enhance our performance and reduce medical errors,” says Dr. Burke. “I want all decisions to be based on accurate data.”
Driven by a combination of advancing voice-recognition technology, adoption of standards such as Voice-XML, increased processing power and networks capable of supporting voice applications, organizations are finding new ways to use voice interfaces to improve customer service, enhance security and boost employee productivity.
In addition, voice mails and teleconferences will soon become just one more type of data to be stored, replayed and searched as easily as a text document. Not yet up to the level of the starship Enterprise’s onboard system (though Dr. Burke does activate his interface by saying “computer”), the technology is rapidly approaching that level of pervasiveness and integration with other systems.
Dr. Redmond Burke, Chief of Cardiovascular Surgery, Miami Children's Hospital Although it’s only starting to find widespread adoption, the concept of machine-based speech recognition is far from new.
“Voice technology continues to be an evolution,” says Richard Cox, vice president of IP and voice services research at AT&T Laboratories. “Speech recognition is never perfect, but people are learning what they can do with it and how to work around its shortcomings.”
Alexander Graham Bell first proposed speech recognition back in the 1870s as a way to help the hard of hearing, and in 1952, Bell Laboratories developed a system that recognized the numbers 0-9 spoken over the phone. Later in that decade, researchers at MIT developed a system that recognized vowel sounds.
But while the research continued to advance, a lack of processing power kept voice technology from moving out of the lab and into commercial use, except for simple applications such as call center voice-response systems.
“People have been interested in the ability to search audio by context since the 1970s, but the processing power was not enough to make it viable on a very large scale,” says Ri Pierce-Grove, an analyst at Datamonitor PLC in New York.
As more processing power becomes available, speech recognition is gaining wider adoption in certain fields. According to Daniel Hong, Datamonitor’s senior voice business analyst, the worldwide market for speech self-service applications will hit $1.5 billion this year and is growing more than 20% annually. A primary driver for this is, of course, money: It costs $5 for a U.S.-based call center employee to take a call, but only 50 cents to serve the customer with a machine.
And those improvements in proc¿essing power also mean that speech technology is finally starting to provide more than just a replacement for a touch-tone dial pad. Systems can now use natural language processing — the ability to recognize the meaning of words as they are used in normal conversation, rather than just a limited set of commands or keywords. Furthering this growth has been the switch from proprietary software to use of open standards such as VoiceXML.
VoiceXML Is a Driver
“The development of the VoiceXML markup language to create open standard voice interfaces has been driving the market for speech technologies for the past several years, since speech applications programmed with VoiceXML can run on multiple vendor platforms,” says Rashmi Sundararajan, an analyst at Frost & Sullivan Ltd. in Palo Alto, Calif.
Another advantage is that since VoiceXML is similar to HTML, it opens up the creation of voice-based applications to Web developers, rather than requiring highly specialized knowledge. Further clearing the path are standard interfaces, such as IBM’s WebSphere Everyplace Multimodal Environment, that give users the option of using voice, keyboard or mouse to interact with applications. Ease of use and interoperability are leading to an explosion in new voice applications, which will only increase with the creation of better tools for developers.
“Speech is evolving much as the Web evolved,” says Brian L. Garr, IBM’s program director for enterprise speech solutions. “The only reason the Web is successful is that application authoring became easier. Voice authoring tools have to evolve for adoption rates to pick up.”
In addition to being able to understand the meanings of the words spoken, there has also been growth in applications that identify who is speaking on a phone call or recording. This technology could be helpful in creating transcripts of conference calls or court proceedings; it could also be used for security applications.
“More and more businesses are interested in speaker recognition for biometric security,” says Cox. “If done correctly, you can get good reliability out of it, and it will give you an extra sense of security.”
Brokerage services firm Pershing LLC, a unit of The Bank of New York Co., has been using voice software from Nuance Communications Inc. in Burlington, Mass., since 1999 for a product called TelExchange, which lets customers check their balances, review their transaction histories and conduct trades. Two years ago, the New York-based brokerage added Nuance’s biometric voice-recognition and password-reset products to its existing system so that users can reset their own LAN passwords. Pershing has 4,300 employees, and use of these tools reduced the number of calls the help desk handles per month by 1,500.
“Password resets are a mundane, not a value-added, task for our service center,” says Pershing Director Peter Antonucci. “We felt we could easily take care of it with this technology.”
Speech-recognition applications fall into three broad categories: communications, search and interaction.
In the communications arena, cellular carriers have offered voice commands for hands-free operation for several years. But this technology is now making its way into enterprise applications. Vocera Communications Inc. in Cupertino, Calif., uses voice over IP and Wi-Fi-enabled badges to enable mobile workers to communicate instantly. A user touches a button on the badge to initiate a conversation. He then gives verbal instructions to be connected to other badge wearers, place outbound calls, check messages or send e-mails. The user can place a call based on the recipient’s name, job title or other identification, such as “the nurse covering Bed 203.” The application is location-sensitive, so the caller can use it to, for example, locate another employee or ask for a connection to the nearest security guard.
The system is primarily used in hospitals, hotels, retail establishments and factories. It currently scales up to 1,800 badges, but Version 4.0, which ships this fall, will substantially raise that limit, according to Brent Lang, Vocera’s vice president of marketing.
Voice search can be done by either keyword or by figuring out the meaning in context. Natural Speech Communications Ltd. in Rishon Lezion, Israel, makes plug-in boards that can monitor up to 130 phone calls simultaneously, spot keywords and alert operators when specific words or phrases are spoken. Security agencies can use these boards for wiretapping, and private-sector organizations use them to mine data from phone calls.
But voice search also has broader public applications. For example, the Podzinger.com site of Cambridge, Mass.-based BBN Technologies searches podcasts. Similarly, the Blinkx.tv site of privately held Blinkx lets visitors search more than 4 million videos available on the Internet and provides the back-end video search function for sites such as Lycos.com. Blinkx has about 600 servers at its data centers in London and San Francisco that crawl the Web for video content and then convert the voices on those videos into searchable speech.
“It could be a Texan talking about country music or a well-spoken Brit speaking about Tony Blair’s government,” says Suranga Chandratillake, Blinkx co-founder and chief technology officer. “It has to understand what was said, regardless of the accent, so you need as many contextual clues as possible to determine the meaning.”
Blinkx uses language analysis and search technology from Autonomy Inc. in San Francisco. Blinkx has production systems for English and Chinese videos, and betas in French, German and Spanish. Blinkx is sticking with Web search, but the technology can be applied to internal corporate communications, such as teleconferences.
Finally, voice adds an additional means of interacting with existing applications that would otherwise require a keyboard, mouse or touch screen. Last year, for example, London-based medical device manufacturer Smith & Nephew PLC added a voice interface for its digital operating-room software. Through a wireless headset, doctors can control the heating and air conditioning, the lighting, and tools, cameras and other devices through a standard interface.
Thermal Services Inc., a heating and air conditioning maintenance company in Omaha, is using a custom field service application from IBM and Openstream Inc. in Somerset, N.J. Thermal Services field technicians have tablet PCs with built-in microphones. They have the option of using a stylus or voice commands, whichever they feel more comfortable with. The tablets have a Wi-Fi connection to the trucks and an EV-DO wireless connection to headquarters. The technicians use the tablets to report when they leave for and arrive at a location, check inventory, order parts and search for a customer’s past history. The tablets also tie into the back-end inventory management, pricing, billing and payroll systems.
When a technician issues the command to print an invoice, a message is sent to the dispatcher, who then confirms the next service appointment and notifies the technician. The system, which went live earlier this year, has allowed technicians to conduct one additional service call per day. It has also eliminated pricing errors and lightened the load on the accounting staff.
Thermal Services’ president, Wade Mayfield, says that one of the best aspects of the system is that it helps train new employees, which will make it easier to expand the company.
“The voice software walks the new person through the steps of the invoice,” he explains. “It is a great training tool.”
Robb is a Computerworld contributing writer in Los Angeles. You can reach him at email@example.com.