Skip the navigation
)

Speech recognition: Your smartphone gets smarter

Voice recognition, never a big hit on the desktop, has finally taken off on smartphones.

By Serdar Yegulalp
March 16, 2011 06:00 AM ET

Computerworld - When we were kids, my friends and I used to play a game where we fantasized about which technologies from Star Trek were most likely to be real-world inventions within our lifetimes. The transporter and warp drive -- not likely. But the communicator, the voice-commanded computer and the universal translator -- very likely.

When speech recognition arrived on the computer desktop, it seemed like a great idea -- but for most people, it wasn't a replacement for the keyboard and mouse. Now speech recognition technology is being put to use in a whole new environment: phones. And its presence there is further driving its use and development in directions it might never have headed on the desktop.

Speech recognition
Shoebox, an early-1960s experimental speech recognition system that solved spoken arithmetic problems, was created by IBM, which is marking the 100-year anniversary of its founding this year.

History

Speech recognition first appeared as a primitive technology in the 1950s, as little more than a curiosity. In the early 1960s, IBM's Shoebox device could recognize 16 spoken words and could respond to simple mathematical requests, such as "three plus four total."

DragonDictate by Dragon Systems was probably the first speech-recognition program for the PC, released in the early 1980s for DOS computers. It could recognize only individual words, spoken one at a time. It evolved over time into the product Dragon NaturallySpeaking (now in Version 11 and owned by Nuance Communications), which can transcribe text spoken in a normal conversational voice and speed.

Speech recognition on the desktop had two big limitations. First, in order for the program to work with a high degree of accuracy, it had to be trained to recognize the speech patterns of the user. Windows Vista's and Windows 7's native speech-to-text technology, and third-party products like Dragon NaturallySpeaking, still require a user-training period to be useful.

The second limitation was the prevalence of the keyboard. Most people were already in the habit of typing, not talking, and so speech control faced the same uphill barriers to adoption as the Dvorak keyboard layout. Why learn to use Dvorak when plain old QWERTY was readily available and worked fine?

Abhi Rele, senior product manager of Microsoft's TellMe team, a group responsible for developing speech recognition technologies for multiple environments, concurs on this point: "In the desktop environment, users have easy access to other interaction modalities -- namely, keyboard and mouse -- and therefore the use of speech is primarily targeted towards speech enthusiasts."

What speech-controlled computing needed for broader adoption was two things -- better out-of-the-box usage and a venue where speech was already king, so to speak. One such venue has been on the rise for a long time: mobile phones.

Matt Revis, vice president of product management and marketing at Nuance, explains the differences between the desktop and mobile environments like this: "The desktop is a stationary environment focused entirely on desktop use cases, and so speech for the desktop follows that task flow: supporting office apps, Web browsing, communications, etc. In mobile, speech is more directed to supporting a variety of lifestyle scenarios: professionals on the go, out-and-about fun, hands-free [calling] and so on."

Gartner analyst Tuong Nguyen agrees that voice makes more sense in a mobile context. "From a usage perspective," he says, "the value of voice recognition on a handheld device is much greater. It adds a user-friendly, intuitive method of input."

This is certainly true, Nguyen adds, if the alternative to speaking a simple declarative statement is to dig down through a slew of menus or struggle with tiny on-screen keyboards: "With the growing adoption of touch-only devices (no physical keys), voice recognition is used to enhanced data entry/input. It also supports hands-free requirements or legislation."
(Story continues on next page.)

Making it work

Speech recognition works by making statistical models of spoken language. "To recognize spoken words," says Google product manager Amir Mane, "we compare the input speech to a statistical model of the language and try to find the closest match -- the system's best guess at what the user said."

Statistical models of a language require a great deal of storage to be practical. "[They] must cover all of the fundamental sounds of the language (phonemes), all of the words, and all of the different ways that the words can be strung together in the spoken language," Mane says. On top of that, there are accents, variations in sex and age, regional pronunciations, word choices ("soda" vs. "cola" vs. "pop") and so on.

Mane notes that Google Voice Search's statistical model requires three elements: acoustic models, language models and a lexicon. "An acoustic model is created by taking audio recordings of speech and the transcriptions of what was said, and using the two to create a representation of the phones -- the basic components of all words in a given language," he says.

The language model involves figuring out what words are likely to follow other words, and using that as a way to improve recognition accuracy. "The word 'empire' will be followed by the words 'state' or 'strike' [as in The Empire Strikes Back] more often than it is followed by the words 'diverse' or 'guava,' " Mane explains. Collecting data from the field helps continuously improve the language model and the lexicon.

Google isn't the only company crowdsourcing its recognition data. Speech-recognition app Vlingo puts cookies on users' phones to continuously build speech models based on users' own feedback, combined with models based on similar speakers.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Mobile and Wireless White Papers
Mobile Middleware Strategies
Learn why a mobile development platform is critical to be able to support today's complex enterprise mobility strategies. Learn what to look for...
The Evolution of Enterprise Mobile App Development
Driven by explosive growth in smartphone and tablet sales, enterprise mobility has become an essential part of business. Organizations across industries are developing...
Native & HTML5 Mobile Apps: Not an either or, but a where and when
Learn how developers are using HTML5 and native development methods to build mobile apps. Get practical insights on how these tools are being...
Enabling Remote Employees with High Quality Video
In this paper, we analyze the delivery of live and on-demand mobile video content. It focuses on specific ways in which organizations can...
What to Look For in Solutions For Mobile Device Management
Managing an increasingly mobile workforce has become one of the most challenging - and important - responsibilities for IT departments. This paper examines...
All Mobile and Wireless White Papers
Mobile and Wireless Webcasts
The Office of Tomorrow with BlackBerry
Curious about the office of the future and how to prepare with BlackBerry solutions? This session discusses the office needs of tomorrow and...
The Changing Role of Tablets in the Enterprise
Do you understand all the capabilities and potential of the BlackBerry PlayBook tablet? BlackBerry® PlayBook™ tablet can help enterprises do business differently.

This webcast...
Security Certifications 101 - BlackBerry and all those acronyms what do they mean and why they matter?
FIPS, Common Criteria, CAPS, AISEP, NFC, NIST, Fraunhofer SIT, CESG, DSD - these are just some of the government and industry certifications which...
PlayBook Video about two Grade 6 classrooms that are using PlayBook tablets
RIM recently worked with Park Manor Public School in Elmira, ON to integrate BlackBerry PlayBook tablets in two Grade 6 classrooms. The project...
McCain Canada deployed BlackBerry PlayBook tablets with a custom application to their salesforce
McCain Foods Limited (McCain) has deployed BlackBerry® PlayBook™ tablets in order to enhance mobility within their sales force- along with a customized application...
All Mobile and Wireless Webcasts
Can prepaid smartphones save you money?
Samsung Exhibit prepaid smartphone

Prepaid service has started to transform from a source of cheap, bottom-of-the-barrel phones into a viable outlet for compelling smartphones. Read more...

Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs