Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin Luck

Contextual Latent World Models for Offline Meta Reinforcement Learning

Mar 03, 2026

Mohammadreza Nakheai, Aidan Scannell, Kevin Luck, Joni Pajarinen

Abstract:Offline meta-reinforcement learning seeks to learn policies that generalize across related tasks from fixed datasets. Context-based methods infer a task representation from transition histories, but learning effective task representations without supervision remains a challenge. In parallel, latent world models have demonstrated strong self-supervised representation learning through temporal consistency. We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder. This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics rather than merely discriminating between tasks. Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.

Via

Access Paper or Ask Questions

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Jul 12, 2022

Mhairi Dunion, Trevor McInroe, Kevin Luck, Josiah Hanna, Stefano V. Albrecht

Figure 1 for Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Figure 2 for Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Figure 3 for Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Figure 4 for Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Abstract:In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training. This issue is intensified for image-based RL where a change in one variable, such as the background colour, can change many pixels in the image, and in turn can change all values in the agent's internal representation of the image. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations. We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Due to the disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).

Via

Access Paper or Ask Questions