Agents that can build temporally abstract representations of their environment are better able to understand their world and make plans on extended time scales, with limited computational power and modeling capacity. However, existing methods for automatically learning temporally abstract world models usually require millions of online environmental interactions and incentivize agents to reach every accessible environmental state, which is infeasible for most real-world robots both in terms of data efficiency and hardware safety. Our lab has developed an approach for simultaneously learning sets of skills and temporally abstract, skill-conditioned world models purely from offline data, enabling agents to perform zero-shot online planning of skill sequences for new tasks.
Learning skills and their induced temporally abstract dynamics
Our approach consists of an offline skill-learning phase and an online planning phase. In the former, given an offline dataset of trajectories, we jointly learn a latent space of skills, skill-conditioned low-level policies, and the temporally abstract dynamics induced by the execution of those low-level policies with a modified, variational inference based approach. In the latter, we plan over the learned space of skills to maximize a reward function we are given at test time.
Allowing for skills of variable length
We further extend this approach to build agents that can dynamically vary the coarseness of their temporal abstractions. We do so by assuming an underlying hierarchical data-generating process where skills of variable length are responsible for producing the trajectories in the dataset we are given. We then use an unsupervised, variational approach to learn the unobserved segmentation points between skills. Once we have a decomposition of trajectories into skills of variable length, we follow our previous work by learning a space of skills, policies, and abstract dynamics. In addition, we also learn a state-dependent termination condition that allows us to end the execution of a skill only after its intended goal has been accomplished and not just after a predetermined amount of time. We find that allowing for skills of variable lengths allows us to learn more accurate and precise temporally abstract dynamics and perform better in downstream planning than skills of fixed-length.
If you are interested in this project, or related areas in reinforcement learning, please contact Ben Freed (firstname.lastname@example.org).
 Freed, B., Venkatraman, S., Sartoretti, G.A., Schneider, J. & Choset, H.. (2023). Learning Temporally AbstractWorld Models without Online Experimentation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10338-10356 Available from https://proceedings.mlr.press/v202/freed23a.html.