A preprint published by Stanford University and Google researchers proposes an AI technique that predicts how goals were achieved, effectively learning to reverse-engineer tasks. They say it enables autonomous agents to learn through self-supervision, which some experts believe is a critical step toward truly intelligent systems.
Learning general policies for complex tasks often requires dealing with unfamiliar objects and scenes, and many methods rely on forms of supervision like expert demonstrations. But these entail significant tuning; demonstrations, for example, must be completed by experts many times over and recorded by special infrastructure.
That’s unlike the researchers’ proposed approach — time reversal as self-superivision (TRASS) — which predicts “reversed trajectories” to create sources of supervision that lead to a goal or goals. A home robot could leverage it to learn tasks like turning on a computer, turning a knob, or opening a drawer, or chores like setting a dining table, making a bed, and cleaning a room.
“Most manipulation tasks that one would want to solve require some understanding of objects and how they interact. However understanding object relationships in a task-specific context is non-trivial,” explain the coauthors. “Consider the task [making a bed.] Starting from a made bed, random perturbations to the bed can crumple the blanket, which when reversed provides supervision on how to flatten and spread the blanket. Similarly, randomly perturbing objects in a clean [or] organized room will distribute the objects around the room. These trajectories reversed will show objects being placed back to their correct positions, strong supervision for room cleaning.”
TRASS works by collecting data given a set of goal states, applying random forces to disrupt the scene, and carefully recording each of the subsequent states. A TRASS-driven agent explores outwardly using no expert knowledge, collecting a trajectory that when reversed can be used by the agent to learn to return to the goal states. In this way, TRASS essentially trains to predict the trajectories in reverse so that the trained model can take the current state as input, providing supervision toward the goal in the form of a guiding trajectory of frames (but not actions).
At test time, a TRASS-driven agent’s objective is to reach some state in a scene that satisfies certain specified goal conditions. At every step the trajectory is recomputed to produce a high-level guiding trajectory, and the guiding trajectory decouples high-level planning and low-level control such that it can be used as indirect supervision to produce a policy via model and model-free techniques.
In experiments, the researchers applied TRASS to the problem fo configuring physical Tetris-like blocks. With a real-world robot — the Kuka IIWA — and a TRASS vision model trained in simulation and then transferred to the robot, they found that TRASS successfully paired blocks it’d seen during training 75% of the time and blocks it hadn’t seen 50% of the time over the course of 20 trials each.
TRASS has limitations in that it can’t be applied in cases where object deformations are irreversible, for example (think cracking an egg, mixing two ingredients, or welding two parts together). But the researchers believe it can be extended by using exploration methods driven by state novelty, among other things.
“[O]ur method … is able to predict unknown goal states and the trajectory to reach them,” they write. “This method used with visual model predictive control is capable of assembling Tetris-style blocks with a physical robot using only visual inputs, while using no demonstrations or explicit supervision.”