Speech recognition: Your smartphone gets smarter

Voice recognition, never a big hit on the desktop, has finally taken off on smartphones.

When we were kids, my friends and I used to play a game where we fantasized about which technologies from Star Trek were most likely to be real-world inventions within our lifetimes. The transporter and warp drive -- not likely. But the communicator, the voice-commanded computer and the universal translator -- very likely.

When speech recognition arrived on the computer desktop, it seemed like a great idea -- but for most people, it wasn't a replacement for the keyboard and mouse. Now speech recognition technology is being put to use in a whole new environment: phones. And its presence there is further driving its use and development in directions it might never have headed on the desktop.

Speech recognition
Shoebox, an early-1960s experimental speech recognition system that solved spoken arithmetic problems, was created by IBM, which is marking the 100-year anniversary of its founding this year.


Speech recognition first appeared as a primitive technology in the 1950s, as little more than a curiosity. In the early 1960s, IBM's Shoebox device could recognize 16 spoken words and could respond to simple mathematical requests, such as "three plus four total."

DragonDictate by Dragon Systems was probably the first speech-recognition program for the PC, released in the early 1980s for DOS computers. It could recognize only individual words, spoken one at a time. It evolved over time into the product Dragon NaturallySpeaking (now in Version 11 and owned by Nuance Communications), which can transcribe text spoken in a normal conversational voice and speed.

Speech recognition on the desktop had two big limitations. First, in order for the program to work with a high degree of accuracy, it had to be trained to recognize the speech patterns of the user. Windows Vista's and Windows 7's native speech-to-text technology, and third-party products like Dragon NaturallySpeaking, still require a user-training period to be useful.

The second limitation was the prevalence of the keyboard. Most people were already in the habit of typing, not talking, and so speech control faced the same uphill barriers to adoption as the Dvorak keyboard layout. Why learn to use Dvorak when plain old QWERTY was readily available and worked fine?

Abhi Rele, senior product manager of Microsoft's TellMe team, a group responsible for developing speech recognition technologies for multiple environments, concurs on this point: "In the desktop environment, users have easy access to other interaction modalities -- namely, keyboard and mouse -- and therefore the use of speech is primarily targeted towards speech enthusiasts."

What speech-controlled computing needed for broader adoption was two things -- better out-of-the-box usage and a venue where speech was already king, so to speak. One such venue has been on the rise for a long time: mobile phones.

Matt Revis, vice president of product management and marketing at Nuance, explains the differences between the desktop and mobile environments like this: "The desktop is a stationary environment focused entirely on desktop use cases, and so speech for the desktop follows that task flow: supporting office apps, Web browsing, communications, etc. In mobile, speech is more directed to supporting a variety of lifestyle scenarios: professionals on the go, out-and-about fun, hands-free [calling] and so on."

Gartner analyst Tuong Nguyen agrees that voice makes more sense in a mobile context. "From a usage perspective," he says, "the value of voice recognition on a handheld device is much greater. It adds a user-friendly, intuitive method of input."

This is certainly true, Nguyen adds, if the alternative to speaking a simple declarative statement is to dig down through a slew of menus or struggle with tiny on-screen keyboards: "With the growing adoption of touch-only devices (no physical keys), voice recognition is used to enhanced data entry/input. It also supports hands-free requirements or legislation."

