Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dengfeng Sun

Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation

May 09, 2026

Weifan Zhang, Xiaofeng Zhao, Adel Bazzi, Mingrui Li, Yifan Wei, Dengfeng Sun

Abstract:Closed-loop traffic simulation requires agents that are both scalable and behaviorally realistic. Recent self-play reinforcement learning approaches demonstrate strong scalability, but their equilibrium strategies fail to capture the socially aware behaviors of real human drivers. We propose a hierarchical architecture that goes beyond self-play by combining high-level multi-agent interaction reasoning with low-level continuous trajectory realization. Specifically, a Stackelberg-style Multi-Agent Reinforcement Learning (MARL) module generates interaction-aware intention commands. These commands condition a low-level continuous motion module, translating the strategic intent into physically consistent, scene-responsive control sequences. To mitigate distribution shift in closed-loop deployment, we introduce a hybrid co-training scheme combining MARL with auxiliary recovery supervision. Experiments on a SUMO-based urban network demonstrate that the proposed framework achieves superior control smoothness and safety compared to self-play and passive imitation baselines, while maintaining competitive traffic efficiency.

* Submitted to IEEE Robotics and Automation Letters (RA-L)

Via

Access Paper or Ask Questions

Receding Hamiltonian-Informed Optimal Neural Control and State Estimation for Closed-Loop Dynamical Systems

Nov 02, 2024

Josue N. Rivera, Dengfeng Sun

Abstract:This paper formalizes Hamiltonian-Informed Optimal Neural (Hion) controllers, a novel class of neural network-based controllers for dynamical systems and explicit non-linear model predictive control. Hion controllers estimate future states and compute optimal control inputs using Pontryagin's Maximum Principle. The proposed framework allows for customization of transient behavior, addressing limitations of existing methods. The Taylored Multi-Faceted Approach for Neural ODE and Optimal Control (T-mano) architecture facilitates training and ensures accurate state estimation. Optimal control strategies are demonstrated for both linear and non-linear dynamical systems.

Via

Access Paper or Ask Questions

Multi-Scale Cell Decomposition for Path Planning using Restrictive Routing Potential Fields

Aug 05, 2024

Josue N. Rivera, Dengfeng Sun

Abstract:In burgeoning domains, like urban goods distribution, the advent of aerial cargo transportation necessitates the development of routing solutions that prioritize safety. This paper introduces Larp, a novel path planning framework that leverages the concept of restrictive potential fields to forge routes demonstrably safer than those derived from existing methods. The algorithm achieves it by segmenting a potential field into a hierarchy of cells, each with a designated restriction zone determined by obstacle proximity. While the primary impetus behind Larp is to enhance the safety of aerial pathways for cargo-carrying Unmanned Aerial Vehicles (UAVs), its utility extends to a wide array of path planning scenarios. Comparative analyses with both established and contemporary potential field-based methods reveal Larp's proficiency in maintaining a safe distance from restrictions and its adeptness in circumventing local minima.

Via

Access Paper or Ask Questions

Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

Apr 29, 2023

Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun

Abstract:This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to cooperatively seek as many sources as possible. Correspondingly, a new technique of discounted Kalman filter is developed to tackle with the non-stochastic disturbances, and a notion of confidence bound in polytope nature is utilized~to aid the computation-efficient cooperation among~multiple agents. With standard assumptions on the unknown environment as well as the disturbances, our algorithm is shown to achieve sub-linear regrets under the two~types of non-stochastic disturbances; both results are comparable to the state-of-the-art. Numerical examples on a real-world pollution monitoring application are provided to demonstrate the effectiveness of our algorithm.

Via

Access Paper or Ask Questions

Multi-Robot Dynamical Source Seeking in Unknown Environments

Mar 19, 2021

Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun

Figure 1 for Multi-Robot Dynamical Source Seeking in Unknown Environments

Figure 2 for Multi-Robot Dynamical Source Seeking in Unknown Environments

Figure 3 for Multi-Robot Dynamical Source Seeking in Unknown Environments

Figure 4 for Multi-Robot Dynamical Source Seeking in Unknown Environments

Abstract:This paper presents an algorithmic framework for the distributed on-line source seeking, termed as 'DoSS', with a multi-robot system in an unknown dynamical environment. Our algorithm, building on a novel concept called dummy confidence upper bound (D-UCB), integrates both estimation of the unknown environment and task planning for the multiple robots simultaneously, and as a result, drives the team of robots to a steady state in which multiple sources of interest are located. Unlike the standard UCB algorithm in the context of multi-armed bandits, the introduction of D-UCB significantly reduces the computational complexity in solving subproblems of the multi-robot task planning. This also enables our 'DoSS' algorithm to be implementable in a distributed on-line manner. The performance of the algorithm is theoretically guaranteed by showing a sub-linear upper bound of the cumulative regret. Numerical results on a real-world methane emission seeking problem are also provided to demonstrate the effectiveness of the proposed algorithm.

Via

Access Paper or Ask Questions