There’s some way to go before Siri controls the world

Tom Hebner, Worldwide Leader of the Cognitive Innovation Group at Nuance Communications, discusses voice search and the emerging future of AI.

Apple, iOS, Nuance, Siri, voice control, artificial intelligence, digital transformation
metamorworks / Getty Images

I recently spent some time with Tom Hebner, Worldwide Leader of the Cognitive Innovation Group at Nuance Communications. We discussed voice search and the emerging future of AI.

The voice user interface

“Voice technologies being deployed by large technology companies is awesome for everyone in the industry because it really shows the power of the technologies, but also shows the edges, too, because people get all hyped up and think it does more than it currently can,” Hebner said.

“The promise of voice technology is that we will need to learn less about the computer. Like, back in the day when you had to type in a terminal, you had to really understand how to code. Then, with graphical user interfaces, it became a little easier,” he said.

“But … with all the home devices, … it’s like you’re playing Jeopardy and must figure out the question to get the answer you want rather than just speaking naturally.”

In part, this is because it is a challenge for users to figure out the language the machines understand, rather than being able to express oneself in natural language, he explained.

A voice-based user interface needs to be both discoverable and natural.

The missing piece is contextual understanding.

Voice design for the rest of us

“There’s a small handful of us that that realized this a long time ago and said, ‘OK, we have to make sure that we are giving the time, energy and effort needed to craft the best conversation to do conversation experience design and what's called user inteface design,” Hebner said.

Voice designers (it’s a skill) combine linguistic, aesthetic, and engineering skills to develop voice-based user interfaces that are capable of effective conversation.

However, there are only a couple hundred people in the world who understand these tasks, and multiple competitors are attempting to enter the space.  

This creates a skills shortage that slows development of these technologies. Hebner says these skills will proliferate over time, but they will slow industry development.

Getting into context

Apple’s HomePod, like other smart speaker systems, is a profoundly complex piece of technology, combining music playback, voice intelligence, networking, and pattern matching skills that took decades to develop.

At the same time, its limitations underline just how much further voice and AI intelligence need to go.

tom hebner Nuance Communications

Tom Hebner

“I was in in a meeting yesterday, and one of the guys was explaining how awesome it is when he goes downstairs and asks for a song and it just plays," Hebner said. "This led to a conversation where we said, ‘But isn’t it more powerful if you come downstairs and the song is already playing because this is what you play every day?”

Contextual intelligence is the current holy grail of this part of the technology industry. It’s why Siri Shortcuts exists.

Personalization machines

“These are hard problems to solve,” said Hebner. “Especially when you’re building for the masses. You know, just because they wanted that song yesterday doesn’t mean they want it today. Did they even stay in the same room? Do they even really like the track, or do they just leave it on?”

The development of personalized, contextually relevant interfaces requires that a level of intelligence be woven into the system that moves beyond command and control (“Hey, Siri, send an email”) to correctly predict that you need to send an email and prepare it for you in advance.

“It’s more of a conversational assistant that understands context about your life, your preferences, and what you want to do and proactively serves up what you want,” Hebner said.

This is a puzzle that requires multiple contexts (where, who, what time, what task, when, how, and so on) and multiple layers (how, why, where, and who to, for example).

Inside the walls

We already ask voice assistants complex questions to recover facts, enabling us to spend time connecting different facts together rather than demanding we focus on memory. (This is a big point also made by Apple VP education, John Couch in his book, Rewiring Education.)

In education, we might see data used to assess learning outcomes, teaching effectiveness, and engagement with learning materials. We are also seeing AI with machine vision impacting medicine, from cutting-edge solutions such as Triton’s Sponge and beyond.  

Hebner said the best spaces for innovation in voice and AI will be in the enterprise.

This is because enterprises tend to be more controlled environments with a narrower set of users and a more focused set of typical problems and solutions than exist in the wider world. This make it possible to develop powerful solutions for specific challenges.

Take conference calls.

One thing Nuance is working on in its lab is a solution that measures use of buzzwords in a conversation, delivering a post-conversation score on the basis that too many such words is bad.

Such applications of machine intelligence within enterprise collaboration could create a never-before-possible feedback loop. It's a loop that may help nurture development of the soft skills that the smart industry will become increasingly reliant on as many business processes become more automated.

I hope you’ve found these short excerpts from our (far more extensive) chat interesting.

Google+? If you use social media and happen to be a Google+ user, why not join AppleHolic's Kool Aid Corner community and get involved with the conversation as we pursue the spirit of the New Model Apple?

Got a story? Please drop me a line via Twitter and let me know. I'd like it if you chose to follow me on Twitter so I can let you know about new articles I publish and reports I find.

How AI will change enterprise mobility
  
Shop Tech Products at Amazon