Speak Easy

Advances in speech recognition software are extending the utility of traditional applications -- and paving the way for broader use.

The velvety voice of that nice young woman on the other end of the phone is really just digits on a disk somewhere at Verizon Communications Inc., but "she" remembers that I spoke to her a few moments earlier, before I was interrupted by another call. "I apologize if I ask some questions you already answered," the voice says. She sounds genuinely contrite.

But the virtual telephone-repair lady is just getting warmed up. "I'll test your line from here," she intones. "OK, I got the line test started. It could take up to a minute. I'll also check to see if anything's changed on the line since you last called." While the test runs, she asks me for more information about my telephone problem, and she seems to understand my every response.

Presently she says, "The line test is finished now. Unfortunately, it couldn't determine if the problem is in Verizon's network or with your equipment, so we need to dispatch a technician. ... Here we are -- I've picked up all of our technicians' current schedules. The earliest we can schedule it is on Thursday, June 3, between 8 a.m. and 6 p.m. Can someone give access to the premises at that time?" The call is soon completed, and on June 3, so is the repair.

Computerized speech has come a long way in 20 years. As Verizon's system illustrates, the technology has become smarter, easier to use and more integrated with other applications. Such technical advances, plus product introductions that facilitate the deployment of the technology by mainstream developers, are enabling new uses for automated speech systems.

A Long and Winding Road

Research in automated speech recognition goes back to the 1930s, but serious commercialization of it didn't begin until 50 years later. In 1988, Dragon Systems Inc. demonstrated a PC-based speech recognition system with an 8,000-word vocabulary. Users had to speak slowly and clearly. One. Word. At. A. Time.

Speak Easy
1pixclear.gif
Image Credit: Plankton Art
1pixclear.gif
The next big step came in 1990, when Dragon demonstrated a 5,000-word continuous-speech system for PCs and a large-vocabulary, speech-to-text system for general-purpose dictation. Then, in 1997, Dragon and IBM both introduced continuous speech recognition systems for general-purpose use.

Meanwhile, corporations began rolling out interactive voice response (IVR) systems. The earlier ones -- indeed, most in use today -- are menu-driven: "For your fund balance, say or press 'one.'" A few advanced systems are more conversational: "What city are you departing from?" Despite the steady advancements to bigger vocabularies, lower error rates and more natural interfaces, however, speech products have remained specialized tools for niche markets such as PC navigation by the disabled, medical dictation and tightly constrained customer service interactions.

But now, previously stand-alone speech systems are linking up with enterprise systems to access other applications and spawn transactions. As a result, these speech systems -- previously the domain of call center and telephony managers -- are increasingly becoming something for the IT shop to worry about, if not manage.

1pixclear.gif
Speak Easy
1pixclear.gif
Image Credit: Plankton Art
Verizon's speech application, for example, can trigger a line test, update customer accounts, schedule repairs and create trouble tickets -- processes that require interfaces with many systems. "If you create something that's just a veneer, people get it very quickly," says Fari Ebrahimi, senior vice president for IT at Verizon. "But for customers to really get value, you need to do something with the back office."

Many of Verizon's back-office functions have been redesigned as Web services and are accessible by customers over the Web or by spoken request. The new system handles some 50,000 repair calls per day and has boosted the percentage of calls that are fully automated from 3% to 20%, Ebrahimi says. He won't say how much the company is saving in labor costs, but he says it's "millions and millions."

Verizon's National Operations Voice Portal is deployed across three geographically dispersed data centers, and calls are routed from point to point using voice-over-IP technology. The system uses speech recognition products and user interface designs from ScanSoft Inc. (which obtained much of Dragon's speech technology via acquisition). Telephony servers at each data center are connected to back-office application servers running BEA Systems Inc.'s BEA WebLogic Server.

"The technology that used to be in those telephone silos, managed by the call center manager, is now becoming standards-based and is being driven by the same application server that serves the Web pages," says William Meisel, president of TMA Associates, a speech-technology consulting firm in Tarzana, Calif. "Now the IT department can create the applications in an environment that's more familiar to them."

Better Listeners

Organizations that have deployed speech technology say that recent advancements in natural-language understanding have made the systems more acceptable to callers. "With IVR, it was 'Touch or say three,' " says Joe Alessi, vice president for marketing and IT at AAA Minnesota/Iowa. "Now we can say, 'I'd like to change my address.' "

The organization last year replaced a touch-tone-based IVR member service system with a self-service system built on the Say Anything natural-language speech engine from Nuance Communications Inc. One objective was to reduce turnover in the call center by freeing agents from handling mundane calls, such as requests for new membership cards. Another goal was to address the problem of callers bailing out of the IVR system because they found the menus confusing, Alessi says.

The new system enabled AAA to reassign 20% of its call center staff as the number of calls that could be completely automated increased. And the organization has reduced processing costs by $2 per call on average, for a total annual savings of $200,000, according to Alessi.

T. Rowe Price Group Inc. in Baltimore also upgraded its menu-driven IVR system to a free-form speech system based on IBM's WebSphere Voice Response and Voice Server with natural-language understanding capabilities. The investment company reports big savings in telephone charges because automated calls can be completed faster. "An area we struggled with is doing transactions in the system," says Nicholas Welsh, a vice president at T. Rowe Price. "They could take three to four minutes, because you have to go through five or six menu legs. Now the same transaction takes 30 seconds because you can speak it all in one sentence."

Tying speech systems to mainstream corporate IT systems, and the use of VoIP, are making it easier to mine databases of voice records, much as companies have mined other customer records for years. For example, Continental Airlines Inc., which has used eQuality Balance from Atlanta-based Witness Systems Inc. to monitor calls and capture voice records and other data for three years, recently began using Witness' new CallMiner product to analyze call content.

IVR analysis tools usually can keep track of and report on a caller's choices based on which menu paths the caller has taken. But CallMiner and a few other tools can go into the voice record and look for specific words or word combinations. Continental, for example, recorded a sample of its 5 million monthly calls and then used CallMiner to turn the dialogues into text and mine it for certain things. In so doing, it discovered that about 10% of the calls contain the word reconfirm.

Calls to reconfirm a flight are "quite frankly low-value calls," says Andre Harris, Continental's director of reservations training and quality. She says she used the CallMiner analysis to justify the deployment of a new IVR system just for flight confirmations.

Continental currently has eight people listening to samples of calls in order to manually prepare a "call-mix report," which is used for analytical purposes by marketers and business planners at the airline. "The pilot test [of CallMiner] helped me realize very quickly that I can do this with one person instead of eight," she says.

And do it better. From the manually prepared call-mix report, Continental could see that it makes a sale on only half of all calls, but it couldn't tell why sales were lost. Telephone agents do try to elicit the reasons, and soon automated call mining will enable the airline to analyze callers' responses, Harris says.

Copyright © 2004 IDG Communications, Inc.

  
Shop Tech Products at Amazon