Google buys speech synthesis firm, Phonetic Arts

xtranormal.jpg

Google's acquiring a speech synthesis firm, Phonetic Arts. Its technology is best known as a speech engine for gaming. Why on earth would Google need one of those? Let's talk about it, in The Long View...

Today's speech synthesis technology is imperfect, at best. The human ear and brain have evolved to be extremely sensitive to tiny nuances in human speech, so any synthetic imperfections can be extremely off-putting.

You'll know what I'm talking about if you've ever listed to one of those Xtranormal clips on YouTube. Uncomfortable to listen to, to say the least. But from what I've heard from the company's samples, Phonetic Arts' technology is far superior. As Google's Mike Cohen blogged:

There’s still a lot to do. ... Phonetic Arts’ team of researchers and engineers work at the cutting edge of speech synthesis.
...
We already have a strong engineering center in London and look forward to welcoming Phonetic Arts to the team. ... We’re confident that together we’ll move a little faster towards that Star Trek future.

What does Phonetic Arts do? Its proprietary technology was originally created for games developers, to help stitch together fragments of speech. For example, if you're publishing a football game, you might want a commentator to say what's going on in the game, as if it was on TV.

But it's impractical to have a voiceover artist record every possible combination of lines. So, you'll have a list of players and a list of actions, and the game needs to stitch the fragments together, to say things like, "Puricelli cuts infield, tackled by Bracken."

It turns out that stitching these fragments of speech together, so that the result sounds entirely natural, is fiendishly difficult. You also want to add a level of natural randomness in the intonation, so the speech doesn't sound exactly the same every time.

The company's technology doesn't just do a remarkably good job of stitching speech fragments together, it can also create the fragments from normally recorded, continuous speech. It's not necessary to feed it a script, or record the fragments separately, the company claims.

Why does Google need this? As more and more day-to-day computing tasks are performed on small, handheld devices -- sooner or later, we're going to stop calling them "phones" -- there's a greater need for high quality speech input and output.

Imagine searching for a nearby Greek restaurant from your Android phone. You can speak your request, Google could ask you what sort of price range you want, and let you know there's one just around the corner. It might even read some reviews to you. All in a natural speaking voice that doesn't scream computer! 

 
As an aside, David Braben is on the company's board of directors. Recognize that name? He was one of the two the young nerds behind the ground-breaking 1984 game, Elite.

 
Speak to me. Leave your comment below...
 

Richi Jennings, blogger at large
  Richi Jennings is an independent analyst/consultant, specializing in blogging, email, and security. A cross-functional IT geek since 1985, you can follow him as @richi on Twitter, pretend to be richij's friend on Facebook, or just use good old email: TLV@richij.com.

You can also read Richi's full profile and disclosure of his industry affiliations.

FREE Computerworld Insider Guide: IT Certification Study Tips
Join the discussion
Be the first to comment on this article. Our Commenting Policies