Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seungyul Han

Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning

May 13, 2026

Minung Kim, Jeongmo Kim, Gwanwoo Choi, Seungyul Han

Abstract:Cross-domain offline reinforcement learning aims to adapt a policy from a source domain to a target domain using only pre-collected datasets, where environment dynamics may differ. A key challenge is to leverage source data while reducing distributional mismatch, particularly when the target dataset is extremely limited. To address this, we propose Target-aligned Coverage Expansion (TCE), a framework that decides how source data should be used, either by directly incorporating target-near transitions or by expanding state coverage through target-aligned generation, guided by theoretical analysis. TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.

Via

Access Paper or Ask Questions

Shaping Zero-Shot Coordination via State Blocking

May 12, 2026

Mingu Kang, Sunwoo Lee, Yonghyeon Jo, Seungyul Han

Abstract:Zero-shot coordination (ZSC) aims to enable agents to cooperate with independently trained partners without prior interaction, a key requirement for real-world multi-agent systems and human-AI collaboration. Existing approaches have largely emphasized increasing partner diversity during training, yet such strategies often fall short of achieving reliable generalization to unseen partners. We introduce State-Blocked Coordination (SBC), a simple yet effective framework that improves ZSC by inducing diverse interaction scenarios without direct environment modification. Specifically, SBC generates a family of virtual environments through state blocking, allowing agents to experience a wide range of suboptimal partner policies. Across multiple benchmarks, SBC demonstrates superior performance in zero-shot coordination, including strong generalization to human partners.

* 9 technical page followed by references and appendix

Via

Access Paper or Ask Questions

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

Feb 19, 2026

Yonghyeon Jo, Sunwoo Lee, Seungyul Han

Abstract:Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), which learns multiple sub-value functions to retain alternative high-value actions. Incorporating these sub-value functions into a Softmax-based behavior policy, S2Q encourages persistent exploration and enables $Q^{\text{tot}}$ to adjust quickly to the changing optima. Experiments on challenging MARL benchmarks confirm that S2Q consistently outperforms various MARL algorithms, demonstrating improved adaptability and overall performance. Our code is available at https://github.com/hyeon1996/S2Q.

* 10 technical page followed by references and appendix. Accepted to ICLR 2026

Via

Access Paper or Ask Questions

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jun 26, 2025

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

Abstract:Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

* 9 technical page followed by references and appendix

Via

Access Paper or Ask Questions

Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Jun 24, 2025

Yisak Park, Sunwoo Lee, Seungyul Han

Figure 1 for Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Figure 2 for Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Figure 3 for Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Figure 4 for Center of Gravity-Guided Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

Abstract:Cooperative multi-agent reinforcement learning (MARL) under sparse rewards presents a fundamental challenge due to limited exploration and insufficient coordinated attention among agents. In this work, we propose the Focusing Influence Mechanism (FIM), a novel framework that enhances cooperation by directing agent influence toward task-critical elements, referred to as Center of Gravity (CoG) state dimensions, inspired by Clausewitz's military theory. FIM consists of three core components: (1) identifying CoG state dimensions based on their stability under agent behavior, (2) designing counterfactual intrinsic rewards to promote meaningful influence on these dimensions, and (3) encouraging persistent and synchronized focus through eligibility-trace-based credit accumulation. These mechanisms enable agents to induce more targeted and effective state transitions, facilitating robust cooperation even in extremely sparse reward settings. Empirical evaluations across diverse MARL benchmarks demonstrate that the proposed FIM significantly improves cooperative performance compared to baselines.

* 9 technical page followed by references and appendix

Via

Access Paper or Ask Questions

PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Feb 06, 2025

Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han

Figure 1 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 2 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 3 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Figure 4 for PRISM: A Robust Framework for Skill-based Meta-Reinforcement Learning with Noisy Demonstrations

Abstract:Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, resulting in unstable skill learning and degraded performance. To overcome this, we propose Prioritized Refinement for Skill-Based Meta-RL (PRISM), a robust framework that integrates exploration near noisy data to generate online trajectories and combines them with offline data. Through prioritization, PRISM extracts high-quality data to learn task-relevant skills effectively. By addressing the impact of noise, our method ensures stable skill learning and achieves superior performance in long-horizon tasks, even with noisy and sub-optimal data.

* 8 pages main, 19 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Feb 05, 2025

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

Figure 1 for Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Figure 2 for Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Figure 3 for Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Figure 4 for Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Abstract:Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering system-wide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL.

* 8 pages main, 21 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Feb 05, 2025

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

Figure 1 for Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Figure 2 for Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Figure 3 for Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Figure 4 for Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Abstract:Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments.

* 8 pages main paper, 19 pages appendices with reference, Submitted to ICML 2025

Via

Access Paper or Ask Questions

Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

Feb 05, 2025

Minung Kim, Kawon Lee, Jungmo Kim, Sungho Choi, Seungyul Han

Abstract:Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.

* 8 pages main, 19 pages appendix with reference. Submitted to ICML 2025

Via

Access Paper or Ask Questions

Exclusively Penalized Q-learning for Offline Reinforcement Learning

May 23, 2024

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

Figure 1 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 2 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 3 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Figure 4 for Exclusively Penalized Q-learning for Offline Reinforcement Learning

Abstract:Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

* 9 pages technical page followed by references and appendix

Via

Access Paper or Ask Questions