Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

ICML 2023

UC San Diego

Paper

Video

Code

Abstract

Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. Specifically, our method solves complex long-horizon tasks in three steps: build a paired abstract environment by simplifying geometry and physics, generate abstract trajectories, and solve the original task by an abstract-to-executable trajectory translator. In the abstract environment, complex dynamics such as physical manipulation are removed, making abstract trajectories easier to generate. However, this introduces a large domain gap between abstract trajectories and the actual executed trajectories as abstract trajectories lack low-level details and aren't aligned frame-to-frame with the executed trajectory. In a manner reminiscent of language translation, our approach leverages a seq-to-seq model to overcome the large domain gap between the abstract and executable trajectories, enabling the low-level policy to follow the abstract trajectory. Experimental results on various unseen long-horizon tasks with different robot embodiments demonstrate the practicability of our methods to achieve one-shot task generalization.

Videos of succesful long-horizon block stacking tasks performed in the real-world using a trajectory translation policy trained in simulation. Tasks are unseen and require manipulation of blocks in locations beyond the original training distribution.

Trajectory Translation (TR²)

Prior approaches have utilized a one-shot imitation learning paradigm where policies can look at a demonstration/trajectory and imitate it, bringing great potential for task-generalization by simply tailoring the trajectory. However, typically these trajectories are human video demonstrations or low-level demonstrations, both of which are infeasible to generate for difficult long-horizon tasks or re-generate in order to re-plan.

Thus, we seek to simplify this problem by improving the scalability and feasibility by utilizing simple high-level agents that generate abstract trajectories. Abstract trajectories only encode simple information about the task at hand. Concretely, in our environments our abstract trajectories simply record the 2D/3D position of objects in the scene over time, tasking the low-level agent to attempt to manipulate the world in a similar manner to achieve a desired task. These high-level agents are pointmasses that can easily move around in space and magically grasp objects, making abstract trajectory generation simple and scalable. As the abstract trajectory lacks low-level details, it doesn't always align frame-to-frame to the actual executed trajectory and creates a domain gap. We bridge the domain gap with the use of transformers in order to better discover the relationship between abstract and executed trajectories.

The use of abstract trajectories enables flexible definition of novel tasks by writing a simple high-level agent to magically move objects around in space. The transformer architecture enables us to more easily follow the abstract trajectory as closely as possible. The combination of abstract trajectories and transformers enables TR² to solve unseen long-horizon tasks. By evaluating our method on a navigation-based task and three manipulation tasks, we find that our agent achieves strong one-shot generalization to new tasks, while being robust to intentional interventions or mistakes via re-planning.

Results

We show example translations of abstract trajectories executed trajectories below as well as detail the environments used and domain gaps bridged. The left column of videos shows the abstract trajectory and the right column shows the executed trajectory. The high-level agents are written using simple heuristics and are represented as a point mass floating in 2D/3D space. As the abstract trajectory lacks low-level details, the low-level agent must learn and discover these details such as object manipulation and apply them while mimicing the abstract trajectory. Furthermore, re-planning is a feasible feature as abstract trajectories can be re-generated to handle mistakes or external interventions.

Show abstract-to-executable translations on

Abstract Trajectory

Executed Trajectory

Box Pusher
The training task is to control an agent (black box) to move a green box to a target (blue sphere). The high-level agent can magically grasp and thus drag the green box. However, the low-level agent is restricted to only pushing and must process the abstract trajectory to determine which direction to push the green box in. At test time there are obstacles observable only by the high-level agent.

Couch Moving
The training task is to move the couch shaped agent through a map of chambers and corners. The agent's couch morphology means that the agent must rotate in chambers ahead of time in order to go through corners. The high-level agent simply tells the low-level agent the path through the map, indicating where corners are, but it is up to the low-level agent to process this information to determine when to rotate in chambers. At test time, maps are longer and vary more.

Block Stacking
The training task is to stack a block with a robot arm. The high-level agent can magically grasp and release blocks anywhere and move easily through space. The low-level agent must process the abstract trajectory to determine where to pick up the block and where to stack it. At test time an agent has to stack multiple blocks in a row in locations beyond the training distribution.

Open Drawer
The training task is open various drawers on cabinets with a mobile robot arm. The high-level agent can magically grasp and pull open drawers easily. The low-level agent must process the abstract trajectory to determine how to follow the abstract trajectory and how to pull open the drawer. At test time the agent must open unseen drawers with unseen handles as well as open more than one drawer on a cabinet.

Attention Analysis

To get an insight into how the transformer architecture enables the policy to solve environments more succesfully, we analyze the learned attention on the Couch Moving environment.

In Couch Moving, the abstract trajectory is composed of high-level states which are simply the 2D position of the high-level agent moving through the maze. We can treat this as a map and easily visualize it over the map. The above video shows the attention over the abstract trajectory / map as the agent solves the task, with dark blue representing high attention and light blue representing minimal attention. We observe that when the agent is in a chamber, it learns to pay attention to the next or next next chamber, both of which are indicative of which orientation the next corner is in. With this attention, the agent is capable of making the correct decision on whether to rotate or not in order to move through the next corner. Results show that transformer architectures achieves much higher success rates compared to LSTM architectures or sub-goal conditioned policies.

Bibtex

@inproceedings{tao2023tr2,
  title     = {Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization}, 
  author    = {Tao, Stone and Li, Xiaochen and Mu, Tongzhou and Huang, Zhiao and Qin, Yuzhe and Su, Hao},
  booktitle = {Fortieth International Conference on Machine Learning},
  year      = {2023},
}

Acknowledgements

Special thanks to Jiayuan Gu for feedback on figures, and additional members of the SU Lab for writing feedback.