See Me, Hear Me . . .

New computer interfaces may respond to gestures and speech.

Computers of the future may perform tasks based on what you say to them, how you gesture or even where you look. But that doesn't mean your keyboard and mouse are going away anytime soon.

"There's never a need to throw away something that works," says Joseph Olive, a director at the multimedia communications research laboratory at Lucent Technologies Inc.'s Bell Labs.

Researchers working on new generations of interfaces are taking a pragmatic approach to how we'll interact with machines, adding support for text, speech, gaze, gesture and more, depending on circumstances and what seems to make sense.

Ideally, tomorrow's computers will be better at anticipating what users want, without needing typed commands. This will involve "context-aware" interfaces, according to Ted Selker, head of the context-aware computing research group at MIT's Media Lab. That might mean a Web application will be able to sense both your mouse and eye movements to determine whether you've visited a site before and what items most interest you—and then dynamically generate a page based on those interests.

By 2006, Selker says, "the computer will know more about why you're doing what you're doing and what it can do to help you."

Say What?

Speech-recognition software is already making inroads in telephone-based customer service applications. Today's systems are often limited to narrow uses, such as saying the name of a company to get a stock quote. Work on broader-based systems is under way, though.

"What we're vigorously researching . . . is to let you speak much more freely," Olive says. Murray Hill, N.J.-based Lucent recently tested a prototype of an automated phone operator at a financial institution. Callers could say things like "I lost my checkbook" instead of wading through menu options.

The test involved routing callers to one of about 40 departments. About 8% of the calls had to be switched to a human receptionist because requests weren't specific enough. Among calls handled by the computer, the accuracy rate was 96%, Olive said.

Speech-recognition experts believe the technology will be increasingly used in mobile applications. After all, says Mike Phillips, chief technology officer at SpeechWorks International Inc. in Boston, it's tough to design an easy- to-use knob interface for a car MP3 player that's got a few thousand songs stored in it.

Other potential uses include more sophisticated hands-free dialing as well as climate control systems.

Speech recognition should also make it easier to use a small device to access data stored on a larger one. "Basically, anything that's on your desktop will be available to you by voice," Olive predicts.

Many speech-recognition experts believe that the robust infrastructure of third-generation high-speed data transmission services will improve speech recognition in wireless applications.

For effective speech recognition, some processing should be done in the handset or on personal digital assistants and some across the network, says Bill Mark, vice president of information and computing sciences at SRI International in Menlo Park, Calif. That will help users overcome problems such as noise and choppy reception in some wireless connections.

In some cases, it would be best for the device to do more than simply process a spoken command.

For example, it might be better if a data request were sent seeking prices for a flight to Paris in October, instead of informing the computer that a caller asked, "How much is it to fly from Boston to Paris next month?"

This presents an architectural challenge, Mark notes: How can the airline's computer transmit something akin to a flight reservation request form to a wireless device so the speech can be understood in context and an appropriate request transmitted back?

In a Gaze

Researchers are also investigating how computers might respond to eye movements. Once again, the idea is to supplement the keyboard and mouse, not to replace them.

Add an eye-tracking camera to a laptop and a system "can actually see what you're looking at on the screen," says Daniel M. Russell, senior manager at the user sciences and experiences research laboratory at IBM's Almaden Research Center in San Jose.

This could help make PCs and laptops easier to use. Russell says he envisions a system with a "Jump" key below the space bar. To issue a command, you would look at an on-screen menu option to get a drop-down list, then select an option by looking at it. To execute, you would hit Jump. "It's like the computer knows what you want to do," Russell says. "It just speeds up."

Cameras are too large and costly for commercial implementation of such a system now, he says, but that may change within five years or so.

MIT's Project Oxygen is also researching a way to combine vision and speech so a computer could respond to a user speaking and pointing.

"I really don't think we're going to replace Microsoft Word anytime soon with a gesture interface," says Trevor Darrell, an assistant professor and artificial intelligence researcher at MIT. The idea is to use alternatives where they make sense, including kiosks, cars and meeting rooms.

"The broadest goal is to make computers more interesting or useful," Darrell says.


Speech Recognition


Caller’s words are captured and digitized by speech-recognition system.


Digitized voice is split into individual frequency components, called spectral representations.


The components are translated into phonemes.


Complex models and algorithms determine a likely translation.
Speech Recognition

Source: SpeechWorks International Inc.

Copyright © 2002 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon