Skip the navigation

Can You Understand Me Now?

Voice technologies now in the labs mean the answer will be yes.

By Linda Rosencrance
November 7, 2005 12:00 PM ET

Computerworld - Every few years, researchers say that automated voice recognition technology has finally arrived, but it seems as though the reality never lives up to the hype.
For the past two decades, systems that enable touch-tone responses to automated voice prompts have been used by businesses for telephone-based routing and self-service. But today, a large number of businesses are investing in newer two-way interactive voice response (IVR) to move beyond the limitations of touch-tone.
Datamonitor PLC, a market research firm in New York, says the next five years will see widespread deployment of IVR across companies, and the applications of speech recognition will grow in complexity and sophistication.
In particular, says Datamonitor analyst Daniel Hong, "in the next five years, the market will witness several large voice-authentication deployments, primarily in the financial services market, because of the increasing need for security and a proliferation of PIN/password-reset applications."
One company that offers such an application today is Nuance Communications Inc. in Menlo Park, Calif. Here's how Nuance's system works: A caller spends a few moments "enrolling" his voice, creating a "voiceprint." Then, when calling the application at a later time, his voice is compared with the voiceprint on file. If there's a match, the caller is validated.
In the next five years, there will also be a tighter integration between speech engines and speech applications, which tend to be separate pieces of software today, says Jim Blake, a senior software engineer at San Diego-based LumenVox LLC.
That will improve the accuracy of speech engines by enabling them to get clues about users and applications when they are stumped, he says.
For example, if the speech engine in an airline reservation system recognizes the cities spoken by a user but not the flight numbers, it might ask the application for a list of possible flight numbers. "This is all done in one interaction," Blake says. "Typically, if one section of the audio isn't understood well, the caller must be reprompted for the missing information. But in our example, we are making the [speech recognizer] smarter by talking with the application."
Real Conversations
Moreover, those applications will be able to handle a user's request in a more human way, says Peter Mahoney, vice president of ScanSoft Inc., a supplier of speech software in Burlington, Mass.
"Today, if you're talking to an automated system to make a flight reservation, you have to tell the system what it wants to hear in the way it wants to hear it," Mahoney says. The system will ask you where you're departing from, where you're going and the date



Our Commenting Policies