Speaking machines don’t exist despite being put on the wish list at the 1956 A.I. conference at Dartmouth College. That was nearly 60 years ago and we seem no closer to accurate speaking machines. What went wrong?
Today we will look at how speaking machines remain out of reach waiting on a scientific breakthrough. Systems created with massive investment from organizations like DARPA and IBM just aren't accurate. Subsequent systems improved, but remained inaccurate compared with the goal.
The power of science in problem solving is awesome because we see repeatedly that science powers progress where engineering alone cannot.
Scientists set language requirements
Yehoshua Bar-Hillel, a pioneer in machine translation and formal linguistics, identified in 1958 that a Universal Encyclopedia (UE) is needed before accurate machine translation is possible. The UE provides the mechanism to understand what was said and, once you know what is said, you can translate it accurately. At the time, a UE was considered beyond our capabilities.
A decade later, in 1969, John Pierce from Bell Labs proposed a higher-level solution. Roughly speaking, get the science right and then do the engineering. The science should explain how language works, implement it to the level of a human speaker, and then implement speech recognition.
If we follow the 1969 plan and include the 1958 requirement, we should be able to create accurate, speaking artificial intelligence.
Given the answer, what did we do to get here?
A.I. has suffered ongoing funding winters as, in Pierce’s 1960s words about machine translation and speech recognition, many supporters behaved “like mad inventors or untrustworthy engineers.”
DARPA re-initiated their funding of language technology in 1985 which has resulted in many improvements in the engineering for speech technology. To avoid the risks from “glamour and deceit” objective evaluation metrics were set.
The trouble is, they did not heed the recommendations for putting science first. How can you create a speech recognition system without understanding the language like a native speaker? Similarly, how can you translate to a target language without understanding the source? The bar was set too low.
Fixating on statistics, not pure science
Human vision illustrates the scale of combinations involved in a brain. Start with the number of eye sensors -- roughly 6 million for color and 120 million for grays. If you put the number of on/off states on a page of A4 paper, you would receive roughly 5 reams of paper 10 times a second from each eye. And the combinations from each sensor needs to be dealt with many times per second.
Does this sound like something a human computer, as Turing modeled, would do? Is there a better design? If computers compress and duplicate, perhaps brains centralize and expand. We will follow that thought next time.
To deal with the volume of data and its combinations involved in speech (and translation and understanding), statistics can be used. Transistors apparently use the statistics of quantum theory, so it may be a good plan.
Statistics aren't magic, but they get (inaccurate) results
Let’s take 1971 as the time investment in statistical systems really escalated with DARPA's funding of speech recognition for a 5-year period. Today’s speech recognition systems are all based on work originated back then.
Computational linguistics specializes in the application of statistical models. Often the results are right, and often they are wrong. There are theoretical categories that just aren't dealt with properly. When it doesn't work in theory, it won't in practice.
Today’s speech recognition and translation systems are built around statistical models and, as predicted more than 45 years ago by Pierce, are inaccurate, not even approaching human level.
That said, the systems have improved and get results in limited cases, like command-based voice-controlled systems as used by Facebook’s Wit.ai. But there is no obvious path from that system to one with natural speech interaction because compromise, and complexity, is baked in.
DARPA's focus on incremental improvement was necessary to avoid an unfocused project, but without a goal of natural interaction with people, the end result is disappointing.
Users just think the speech from A.I. doesn't work.
Today we looked at the work by legendary scientists to set out a plan for speaking machines, and the push from engineers like Frederick Jelinek while at IBM, to come up with alternative, compromise solutions.
Some leading engineers today claim that the statistical models won, but a battle for technology isn't won when the technology doesn't work. With the benefit of hindsight, if the goal was to create human-like accuracy in speech, statistical analysis by computational linguistics has failed.
Leaders like John Pierce predicted that without improvements to the science, the engineering would fail. Now that the engineering has demonstrably failed, how do we produce human-like accuracy in language understanding, translation, and conversation?
The science restarts by looking at brains -- the only machines that work today. Often, brain researchers think of a brain cell, a neuron, as being like a little processor. So they argue a brain is like 100 billion computers.
Next time I’ll explain a better brain model based on pattern matching, not processing. This change is important, because the world of A.I. has looked to force computation into almost everything.
This article is published as part of the IDG Contributor Network. Want to Join?