Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karl H. Johansson

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

Apr 10, 2026

Daniele Foffano, Arvid Eriksson, David Broman, Karl H. Johansson, Alexandre Proutiere

Abstract:Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent's advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window. We develop two guides: (i) Sigmoid Advantage Guidance (SAG) and (ii) Exponential Advantage Guidance (EAG). We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage-implying policy improvement under standard assumptions. Additionally, we show that the trajectories generated from AGD-MBRL follow an improved policy (that is, with higher value) compared to an unguided diffusion model. AGD integrates seamlessly with PolyGRAD-style architectures by guiding the state components while leaving action generation policy-conditioned, and requires no change to the diffusion training objective. On MuJoCo control tasks (HalfCheetah, Hopper, Walker2D and Reacher), AGD-MBRL improves sample efficiency and final return over PolyGRAD, an online Diffuser-style reward guide, and model-free baselines (PPO/TRPO), in some cases by a margin of 2x. These results show that advantage-aware guidance is a simple, effective remedy for short-horizon myopia in diffusion-model MBRL.

Via

Access Paper or Ask Questions

MeanFlow Meets Control: Scaling Sampled-Data Control for Swarms

Mar 20, 2026

Anqi Dong, Yongxin Chen, Karl H. Johansson, Johan Karlsson

Abstract:Steering large-scale swarms in only a few control updates is challenging because real systems operate in sampled-data form: control inputs are updated intermittently and applied over finite intervals. In this regime, the natural object is not an instantaneous velocity field, but a finite-window control quantity that captures the system response over each sampling interval. Inspired by MeanFlow, we introduce a control-space learning framework for swarm steering under linear time-invariant dynamics. The learned object is the coefficient that parameterizes the finite-horizon minimum-energy control over each interval. We show that this coefficient admits both an integral representation and a local differential identity along bridge trajectories, which leads to a simple stop-gradient training objective. At implementation time, the learned coefficient is used directly in sampled-data updates, so the prescribed dynamics and actuation map are respected by construction. The resulting framework provides a scalable approach to few-step swarm steering that is consistent with the sampled-data structure of real control systems.

Via

Access Paper or Ask Questions

RHYME-XT: A Neural Operator for Spatiotemporal Control Systems

Mar 18, 2026

Marijn Ruiter, Miguel Aguiar, Jake Rap, Karl H. Johansson, Amritam Das

Abstract:We propose RHYME-XT, an operator-learning framework for surrogate modeling of spatiotemporal control systems governed by input-affine nonlinear partial integro-differential equations (PIDEs) with localized rhythmic behavior. RHYME-XT uses a Galerkin projection to approximate the infinite-dimensional PIDE on a learned finite-dimensional subspace with spatial basis functions parameterized by a neural network. This yields a projected system of ODEs driven by projected inputs. Instead of integrating this non-autonomous system, we directly learn its flow map using an architecture for learning flow functions, avoiding costly computations while obtaining a continuous-time and discretization-invariant representation. Experiments on a neural field PIDE show that RHYME-XT outperforms a state-of-the-art neural operator and is able to transfer knowledge effectively across models trained on different datasets, through a fine-tuning process.

* 6 pages, 5 figures. Submitted to IEEE Control Systems Letters (L-CSS) and CDC 2026

Via

Access Paper or Ask Questions

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

Feb 18, 2026

Zifan Wang, Riccardo De Santi, Xiaoyu Mo, Michael M. Zavlanos, Andreas Krause, Karl H. Johansson

Abstract:Fine-tuning pre-trained diffusion and flow models to optimize downstream utilities is central to real-world deployment. Existing entropy-regularized methods primarily maximize expected reward, providing no mechanism to shape tail behavior. However, tail control is often essential: the lower tail determines reliability by limiting low-reward failures, while the upper tail enables discovery by prioritizing rare, high-reward outcomes. In this work, we present Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR). We address two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail. Unlike prior approaches that rely on non-linear optimization, we leverage the variational dual formulation of CVaR to decompose it into a decoupled two-stage procedure: a lightweight one-dimensional threshold optimization step, and a single entropy-regularized fine-tuning process via a specific pseudo-reward. This decomposition achieves CVaR fine-tuning efficiently with computational cost comparable to standard expected fine-tuning methods. We demonstrate the effectiveness of TFFT across illustrative experiments, high-dimensional text-to-image generation, and molecular design.

* 33 pages

Via

Access Paper or Ask Questions

Risk-Averse Learning with Varying Risk Levels

Dec 28, 2025

Siyi Wang, Zifan Wang, Karl H. Johansson

Abstract:In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynamic environments with varying risk levels, employing Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric. Two information settings are considered: a first-order scenario, where the learner observes both function values and their gradients; and a zeroth-order scenario, where only function evaluations are available. For both cases, we develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples. The regret analysis demonstrates the adaptability of the algorithms in non-stationary and risk-sensitive settings. Finally, numerical experiments are presented to demonstrate the efficacy of the methods.

Via

Access Paper or Ask Questions

Wasserstein Distributionally Robust Nash Equilibrium Seeking with Heterogeneous Data: A Lagrangian Approach

Nov 18, 2025

Zifan Wang, Georgios Pantazis, Sergio Grammatico, Michael M. Zavlanos, Karl H. Johansson

Abstract:We study a class of distributionally robust games where agents are allowed to heterogeneously choose their risk aversion with respect to distributional shifts of the uncertainty. In our formulation, heterogeneous Wasserstein ball constraints on each distribution are enforced through a penalty function leveraging a Lagrangian formulation. We then formulate the distributionally robust Nash equilibrium problem and show that under certain assumptions it is equivalent to a finite-dimensional variational inequality problem with a strongly monotone mapping. We then design an approximate Nash equilibrium seeking algorithm and prove convergence of the average regret to a quantity that diminishes with the number of iterations, thus learning the desired equilibrium up to an a priori specified accuracy. Numerical simulations corroborate our theoretical findings.

Via

Access Paper or Ask Questions

Source-Guided Flow Matching

Aug 20, 2025

Zifan Wang, Alice Harting, Matthieu Barreau, Michael M. Zavlanos, Karl H. Johansson

Abstract:Guidance of generative models is typically achieved by modifying the probability flow vector field through the addition of a guidance field. In this paper, we instead propose the Source-Guided Flow Matching (SGFM) framework, which modifies the source distribution directly while keeping the pre-trained vector field intact. This reduces the guidance problem to a well-defined problem of sampling from the source distribution. We theoretically show that SGFM recovers the desired target distribution exactly. Furthermore, we provide bounds on the Wasserstein error for the generated distribution when using an approximate sampler of the source distribution and an approximate vector field. The key benefit of our approach is that it allows the user to flexibly choose the sampling method depending on their specific problem. To illustrate this, we systematically compare different sampling methods and discuss conditions for asymptotically exact guidance. Moreover, our framework integrates well with optimal flow matching models since the straight transport map generated by the vector field is preserved. Experimental results on synthetic 2D benchmarks, image datasets, and physics-informed generative tasks demonstrate the effectiveness and flexibility of the proposed framework.

Via

Access Paper or Ask Questions

Beyond Line-of-Sight: Cooperative Localization Using Vision and V2X Communication

Jul 28, 2025

Annika Wong, Zhiqi Tang, Frank J. Jiang, Karl H. Johansson, Jonas Mårtensson

Abstract:Accurate and robust localization is critical for the safe operation of Connected and Automated Vehicles (CAVs), especially in complex urban environments where Global Navigation Satellite System (GNSS) signals are unreliable. This paper presents a novel vision-based cooperative localization algorithm that leverages onboard cameras and Vehicle-to-Everything (V2X) communication to enable CAVs to estimate their poses, even in occlusion-heavy scenarios such as busy intersections. In particular, we propose a novel decentralized observer for a group of connected agents that includes landmark agents (static or moving) in the environment with known positions and vehicle agents that need to estimate their poses (both positions and orientations). Assuming that (i) there are at least three landmark agents in the environment, (ii) each vehicle agent can measure its own angular and translational velocities as well as relative bearings to at least three neighboring landmarks or vehicles, and (iii) neighboring vehicles can communicate their pose estimates, each vehicle can estimate its own pose using the proposed decentralized observer. We prove that the origin of the estimation error is locally exponentially stable under the proposed observer, provided that the minimal observability conditions are satisfied. Moreover, we evaluate the proposed approach through experiments with real 1/10th-scale connected vehicles and large-scale simulations, demonstrating its scalability and validating the theoretical guarantees in practical scenarios.

* Accepted at the 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC 2025)

Via

Access Paper or Ask Questions

Personalized and Resilient Distributed Learning Through Opinion Dynamics

May 20, 2025

Luca Ballotta, Nicola Bastianello, Riccardo M. G. Ferrari, Karl H. Johansson

Abstract:In this paper, we address two practical challenges of distributed learning in multi-agent network systems, namely personalization and resilience. Personalization is the need of heterogeneous agents to learn local models tailored to their own data and tasks, while still generalizing well; on the other hand, the learning process must be resilient to cyberattacks or anomalous training data to avoid disruption. Motivated by a conceptual affinity between these two requirements, we devise a distributed learning algorithm that combines distributed gradient descent and the Friedkin-Johnsen model of opinion dynamics to fulfill both of them. We quantify its convergence speed and the neighborhood that contains the final learned models, which can be easily controlled by tuning the algorithm parameters to enforce a more personalized/resilient behavior. We numerically showcase the effectiveness of our algorithm on synthetic and real-world distributed learning tasks, where it achieves high global accuracy both for personalized models and with malicious agents compared to standard strategies.

* This work has been submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots

Mar 28, 2025

Shuqing Liu, Rong Su, Karl H. Johansson

Figure 1 for A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots

Figure 2 for A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots

Figure 3 for A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots

Figure 4 for A Centralized Planning and Distributed Execution Method for Shape Filling with Homogeneous Mobile Robots

Abstract:Nature has inspired humans in different ways. The formation behavior of animals can perform tasks that exceed individual capability. For example, army ants could transverse gaps by forming bridges, and fishes could group up to protect themselves from predators. The pattern formation task is essential in a multiagent robotic system because it usually serves as the initial configuration of downstream tasks, such as collective manipulation and adaptation to various environments. The formation of complex shapes, especially hollow shapes, remains an open question. Traditional approaches either require global coordinates for each robot or are prone to failure when attempting to close the hole due to accumulated localization errors. Inspired by the ribbon idea introduced in the additive self-assembly algorithm by the Kilobot team, we develop a two-stage algorithm that does not require global coordinates information and effectively forms shapes with holes. In this paper, we investigate the partitioning of the shape using ribbons in a hexagonal lattice setting and propose the add-subtract algorithm based on the movement sequence induced by the ribbon structure. This advancement opens the door to tasks requiring complex pattern formations, such as the assembly of nanobots for medical applications involving intricate structures and the deployment of robots along the boundaries of areas of interest. We also provide simulation results on complex shapes, an analysis of the robustness as well as a proof of correctness of the proposed algorithm.

Via

Access Paper or Ask Questions