Unlike human students, computers don’t seem to get bored or frustrated when a lesson is too easy or too hard. But just like humans, they do better when a lesson plan is “just right” for their level of skill. Coming up with the right curricula isn’t easy, though, so computer scientists wondered: What if they could make machines design their own?
That’s what researchers have done in several new studies, creating artificial intelligence (AI) that can figure out how best to teach itself. The work could speed learning in self-driving cars and household robots, and it might even help crack previously unsolvable math problems.
In one of the new experiments, an AI program tries to quickly reach a destination by navigating a 2D grid populated with solid blocks. The “agent” improves its abilities through a process called reinforcement learning, a kind of trial and error.
To help it navigate increasingly complex worlds, the researchers—led by University of California (UC), Berkeley, graduate student Michael Dennis and Natasha Jaques, a research scientist at Google—considered two ways in which they could draw the maps. One method randomly distributed blocks; with it, the AI didn’t learn much. Another method remembered what the AI had struggled with in the past, and maximized difficulty accordingly. But that made the worlds too hard—and sometimes even impossible—to complete.
So the scientists created a setting that was just right, using a new approach they call PAIRED. First, they coupled their AI with a nearly identical one, albeit with a slightly different set of strengths, which they called the antagonist. Then, they had a third AI design worlds that were easy for the antagonist—but hard for the original protagonist. That kept the tasks just at the edge of the protagonist’s ability to solve. The designer, like the two agents, uses a neural network—a program inspired by the brain’s architecture—to learn its task over many trials.
After training, the protagonist attempted a set of difficult mazes. If it trained using the two older methods, it solved none of the new mazes. But after training with PAIRED, it solved one in five, the team reported last month at the Conference on Neural Information Processing Systems (NeurIPS). “We were excited by how PAIRED started working pretty much out of the gate,” Dennis says.
In another study, presented at a NeurIPS workshop, Jaques and colleagues at Google used a version of PAIRED to teach an AI agent to fill out web forms and book a flight. Whereas a simpler teaching method led it to fail nearly every time, an AI trained with the PAIRED method succeeded about 50% of the time.
The PAIRED approach is a clever way to get AI to learn, says Bart Selman, a computer scientist at Cornell University and president of the Association for the Advancement of Artificial Intelligence.
Selman and his colleagues presented another approach for so-called “autocurricula” at the meeting. Their task was a game called Sokoban, in which an AI agent must push blocks to target locations. But blocks can get stuck in dead ends, so success often requires planning hundreds of steps ahead. (Imagine rearranging large furniture in a small apartment.)
Their system creates a collection of simpler puzzles to train on, with fewer blocks and targets. Then, based on the recent performance of their AI, it selects puzzles that the agent only occasionally solves, effectively ratcheting the lesson plan to the right level. Sometimes, the right puzzles are hard to predict, Selman says. “The notion of what is a simpler task is not always obvious.”
The researchers tested their trained agent on 225 problems that no computer had ever solved. It cracked 80% of them, with about one-third of its success coming strictly from the novel training method. “That was just fun to see,” Selman says. He says he now receives astounded messages from AI researchers who’ve been working on the problems for decades. He hopes to apply the method next to unsolved math proofs.
Pieter Abbeel, a computer scientist at UC Berkeley, also showed at the meeting that autocurricula can help robots learn to manipulate objects. He says the approach could even be used for human students. “As an instructor, I think, ‘Hey, not every student needs the same homework exercise,’” Abbeel says, noting that AI could help tailor harder or easier material to a student’s needs. As for AI autocurricula, he says, “I think it’s going to be at the core of pretty much all reinforcement learning.”