DARPA teaches robots to cook by watching YouTube

Yiannis Aloimonos (center) led a team that programmed robots to learn by watching YouTube.

CS professor Yiannis Aloimonos (center) led a team that programmed robots to learn by watching YouTube videos.

Credit: Univ. Maryland/John T. Consoli

The Pentagon's most advanced tech-development wing has succeeded in developing a mathematical language so advanced it could allow robots to learn by watching YouTube videos.

The Defense Advanced Research Projects Agency (DARPA) issued a series of grants in 2011 to fund research into ways to create a mathematical language that would allow the military to combine data from drone video, cell-phone intercepts, targeting radar and any other available method of sensing the outside world into a single stream of data, but that was only the initial goal.

The real intention was to create a mathematical model that would allow advanced sensors to figure out which of the things they see or hear are important and filter out those that are trivial before passing them along to humans. Sensors designed only to see what's happening, not decide whether it's important, "process their signals as if they were seeing the world anew at every instant," according to the 2011 solicitation for proposals under  the Mathematics of Sensing, Exploitation, and Execution (MSEE) project.

"The MSEE program initially focused on sensing, which involves perception and understanding of what’s happening in a visual scene, not simply recognizing and identifying objects," according to Reza Ghanadan, a program manager in DARPA’s Defense Sciences Offices.

“We’ve now taken the next step to execution, where a robot processes visual cues through a manipulation action-grammar module and translates them into actions," Ghanadan said.

Developing an algorithm that can effectively identify objects, actions and figure out which are important and which to ignore – something even the human brain does only imperfectly and inconsistently – requires that machines be capable not only of learning, but learning "in an unsupervised or semi-supervised fashion," and process data in ways that mimic some aspects of human judgment, according to the original requirement.

The first result of that effort is a robot programmed by researchers at the University of Maryland that was able to teach itself to use kitchen tools by watching humans do it in videos on YouTube, according to a release yesterday from DARPA, an announcement from the University and a research paper presented yesterday at the Association for the Advancement of Artificial Intelligence Conference in Austin, Texas.

Project, led by computer scientist Yiannis Aloimonos, modified several semi-humanoid Baxter Research Robots by adding a pair of data-processing modules designed as convolutional neural networks (CNN) –a design that also powers voice-recognition systems in smartphones and facial-recognition software used in security biometrics.

One of the two CNN modules was designed to recognize objects. The other was programmed to track movements – following not only objects in motion, but also creating an abstracted mathematical model that would help identify how each part of a movement related to the others and how, eventually, the robot could reproduce the movement itself,

Camera-equipped monitoring systems watching someone pick up a pitcher and pour water would interpret the action as thousands of snapshots of individual instants in which hands, arms, pitchers and water were in different positions. The CNN abstraction was designed to show how those snapshots were related and identify the arrival of water in a pan as the goal of all the rest, and imitate both the process of getting it there and the result.

"We are trying to create a technology so that robots eventually can interact with humans," according to research-team member Cornelia Fermüller, who was quoted in a release from the University of Maryland Institute for Advanced Computer Studies (UMIACS), where the research was conducted. "[Robots] need to understand what humans are doing. For that, we need tools so that the robots can pick up a human’s actions and track them in real time. We are interested in understanding all of these components. How is an action performed by humans? How is it perceived by humans? What are the cognitive processes behind it?"

The robots were able to mimic the tasks performed on YouTube videos with no additional programming or help from humans as long as they had in front of them exactly the same implements that were used in the videos.

"Others have tried to copy the movements. Instead, we try to copy the goals. This is the breakthrough," Aloimonos said the same announcement. "We chose cooking videos because everyone has done it and understands it. But cooking is complex in terms of manipulation, the steps involved and the tools you use. If you want to cut a cucumber, for example, you need to grab the knife, move it into place, make the cut and observe the results to make sure you did them properly."

Industrial robots handling complex welding and lifting jobs on assembly lines are also able to complete a long, complex series of tasks, but have to be carefully programmed ahead of time to do them and are generally unable to respond to changes they didn't know about ahead of time.

The general-purpose Baxter robots, equipped with programming and hardware that allow them to observe, analyze and reproduce the behavior of robots or humans around them (or on TV), could learn more easily to do common chores or follow directions without any external programming at all.

"Instead of the long and expensive process of programming code to teach robots to do tasks, this research opens the potential for robots to learn much faster, at much lower cost and, to the extent they are authorized to do so, share that knowledge with other robots," Ghanadan said in DARPA's release.

"By having flexible robots, we’re contributing to the next phase of automation," Aloimonos said. "This will be the next industrial revolution,” said Aloimonos. “We will have smart manufacturing environments and completely automated warehouses."

Crash Course: Advanced beginner's guide to R
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies