Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gaurav S. Sukhatme

GAMMS: Graph based Adversarial Multiagent Modeling Simulator

Feb 04, 2026

Rohan Patil, Jai Malegaonkar, Xiao Jiang, Andre Dion, Gaurav S. Sukhatme, Henrik I. Christensen

Abstract:As intelligent systems and multi-agent coordination become increasingly central to real-world applications, there is a growing need for simulation tools that are both scalable and accessible. Existing high-fidelity simulators, while powerful, are often computationally expensive and ill-suited for rapid prototyping or large-scale agent deployments. We present GAMMS (Graph based Adversarial Multiagent Modeling Simulator), a lightweight yet extensible simulation framework designed to support fast development and evaluation of agent behavior in environments that can be represented as graphs. GAMMS emphasizes five core objectives: scalability, ease of use, integration-first architecture, fast visualization feedback, and real-world grounding. It enables efficient simulation of complex domains such as urban road networks and communication systems, supports integration with external tools (e.g., machine learning libraries, planning solvers), and provides built-in visualization with minimal configuration. GAMMS is agnostic to policy type, supporting heuristic, optimization-based, and learning-based agents, including those using large language models. By lowering the barrier to entry for researchers and enabling high-performance simulations on standard hardware, GAMMS facilitates experimentation and innovation in multi-agent systems, autonomous planning, and adversarial modeling. The framework is open-source and available at https://github.com/GAMMSim/GAMMS/

Via

Access Paper or Ask Questions

PIP-LLM: Integrating PDDL-Integer Programming with LLMs for Coordinating Multi-Robot Teams Using Natural Language

Oct 26, 2025

Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme

Abstract:Enabling robot teams to execute natural language commands requires translating high-level instructions into feasible, efficient multi-robot plans. While Large Language Models (LLMs) combined with Planning Domain Description Language (PDDL) offer promise for single-robot scenarios, existing approaches struggle with multi-robot coordination due to brittle task decomposition, poor scalability, and low coordination efficiency. We introduce PIP-LLM, a language-based coordination framework that consists of PDDL-based team-level planning and Integer Programming (IP) based robot-level planning. PIP-LLMs first decomposes the command by translating the command into a team-level PDDL problem and solves it to obtain a team-level plan, abstracting away robot assignment. Each team-level action represents a subtask to be finished by the team. Next, this plan is translated into a dependency graph representing the subtasks' dependency structure. Such a dependency graph is then used to guide the robot-level planning, in which each subtask node will be formulated as an IP-based task allocation problem, explicitly optimizing travel costs and workload while respecting robot capabilities and user-defined constraints. This separation of planning from assignment allows PIP-LLM to avoid the pitfalls of syntax-based decomposition and scale to larger teams. Experiments across diverse tasks show that PIP-LLM improves plan success rate, reduces maximum and average travel costs, and achieves better load balancing compared to state-of-the-art baselines.

Via

Access Paper or Ask Questions

Latent Weight Diffusion: Generating Policies from Trajectories

Oct 17, 2024

Shashank Hegde, Gautam Salhotra, Gaurav S. Sukhatme

Abstract:With the increasing availability of open-source robotic data, imitation learning has emerged as a viable approach for both robot manipulation and locomotion. Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost - namely, larger model size and slower inference. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (i.e., diffusing trajectories): fewer diffusion queries accumulate greater trajectory tracking errors. Thus, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion to learn a distribution over policies for robotic tasks, rather than over trajectories. Our approach encodes demonstration trajectories into a latent space and then decodes them into policies using a hypernetwork. We employ a diffusion denoising model within this latent space to learn its distribution. We demonstrate that LWD can reconstruct the behaviors of the original policies that generated the trajectory dataset. LWD offers the benefits of considerably smaller policy networks during inference and requires fewer diffusion model queries. When tested on the Metaworld MT10 benchmark, LWD achieves a higher success rate compared to a vanilla multi-task policy, while using models up to ~18x smaller during inference. Additionally, since LWD generates closed-loop policies, we show that it outperforms Diffusion Policy in long action horizon settings, with reduced diffusion queries during rollout.

Via

Access Paper or Ask Questions

Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Oct 09, 2024

Sumeet Batra, Gaurav S. Sukhatme

Figure 1 for Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Figure 2 for Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Figure 3 for Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Figure 4 for Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Abstract:Generalizing vision-based reinforcement learning (RL) agents to novel environments remains a difficult and open challenge. Current trends are to collect large-scale datasets or use data augmentation techniques to prevent overfitting and improve downstream generalization. However, the computational and data collection costs increase exponentially with the number of task variations and can destabilize the already difficult task of training RL agents. In this work, we take inspiration from recent advances in computational neuroscience and propose a model, Associative Latent DisentAnglement (ALDA), that builds on standard off-policy RL towards zero-shot generalization. Specifically, we revisit the role of latent disentanglement in RL and show how combining it with a model of associative memory achieves zero-shot generalization on difficult task variations without relying on data augmentation. Finally, we formally show that data augmentation techniques are a form of weak disentanglement and discuss the implications of this insight.

Via

Access Paper or Ask Questions

Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Sep 17, 2024

Peihan Li, Yuwei Wu, Jiazhen Liu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou

Figure 1 for Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 2 for Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 3 for Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 4 for Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Abstract:Multi-robot collaboration for target tracking presents significant challenges in hazardous environments, including addressing robot failures, dynamic priority changes, and other unpredictable factors. Moreover, these challenges are increased in adversarial settings if the environment is unknown. In this paper, we propose a resilient and adaptive framework for multi-robot, multi-target tracking in environments with unknown sensing and communication danger zones. The damages posed by these zones are temporary, allowing robots to track targets while accepting the risk of entering dangerous areas. We formulate the problem as an optimization with soft chance constraints, enabling real-time adjustments to robot behavior based on varying types of dangers and failures. An adaptive replanning strategy is introduced, featuring different triggers to improve group performance. This approach allows for dynamic prioritization of target tracking and risk aversion or resilience, depending on evolving resources and real-time conditions. To validate the effectiveness of the proposed method, we benchmark and evaluate it across multiple scenarios in simulation and conduct several real-world experiments.

Via

Access Paper or Ask Questions

AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Jul 10, 2024

Bingjie Tang, Iretiayo Akinola, Jie Xu, Bowen Wen, Ankur Handa, Karl Van Wyk, Dieter Fox, Gaurav S. Sukhatme, Fabio Ramos, Yashraj Narang

Figure 1 for AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Figure 2 for AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Figure 3 for AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Figure 4 for AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Abstract:Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems.To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range.

Via

Access Paper or Ask Questions

Inverse Risk-sensitive Multi-Robot Task Allocation

Jun 14, 2024

Guangyao Shi, Gaurav S. Sukhatme

Figure 1 for Inverse Risk-sensitive Multi-Robot Task Allocation

Figure 2 for Inverse Risk-sensitive Multi-Robot Task Allocation

Figure 3 for Inverse Risk-sensitive Multi-Robot Task Allocation

Abstract:We consider a new variant of the multi-robot task allocation problem - Inverse Risk-sensitive Multi-Robot Task Allocation (IR-MRTA). "Forward" MRTA - the process of deciding which robot should perform a task given the reward (cost)-related parameters, is widely studied in the multi-robot literature. In this setting, the reward (cost)-related parameters are assumed to be already known: parameters are first fixed offline by domain experts, followed by coordinating robots online. What if we need these parameters to be adjusted by non-expert human supervisors who oversee the robots during tasks to adapt to new situations? We are interested in the case where the human supervisor's perception of the allocation risk may change and suggest different allocations for robots compared to that from the MRTA algorithm. In such cases, the robots need to change the parameters of the allocation problem based on evolving human preferences. We study such problems through the lens of inverse task allocation, i.e., the process of finding parameters given solutions to the problem. Specifically, we propose a new formulation IR-MRTA, in which we aim to find a new set of parameters of the human behavioral risk model that minimally deviates from the current MRTA parameters and can make a greedy task allocation algorithm allocate robot resources in line with those suggested by humans. We show that even in the simple case such a problem is a non-convex optimization problem. We propose a Branch $\&$ Bound algorithm (BB-IR-MRTA) to solve such problems. In numerical simulations of a case study on multi-robot target capture, we demonstrate how to use BB-IR-MRTA and we show that the proposed algorithm achieves significant advantages in running time and peak memory usage compared to a brute-force baseline.

Via

Access Paper or Ask Questions

Active Signal Emitter Placement In Complex Environments

May 04, 2024

Christopher E. Denniston, Baskın Şenbaşlar, Gaurav S. Sukhatme

Abstract:Placement of electromagnetic signal emitting devices, such as light sources, has important usage in for signal coverage tasks. Automatic placement of these devices is challenging because of the complex interaction of the signal and environment due to reflection, refraction and scattering. In this work, we iteratively improve the placement of these devices by interleaving device placement and sensing actions, correcting errors in the model of the signal propagation. To this end, we propose a novel factor-graph based belief model which combines the measurements taken by the robot and an analytical light propagation model. This model allows accurately modelling the uncertainty of the light propagation with respect to the obstacles, which greatly improves the informative path planning routine. Additionally, we propose a method for determining when to re-plan the emitter placements to balance a trade-off between information about a specific configuration and frequent updating of the configuration. This method incorporates the uncertainty from belief model to adaptively determine when re-configuration is needed. We find that our system has a 9.8% median error reduction compared to a baseline system in simulations in the most difficult environment. We also run on-robot tests and determine that our system performs favorably compared to the baseline.

* Submitted to RA-L

Via

Access Paper or Ask Questions

Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Apr 11, 2024

Jiazhen Li, Peihan Li, Yuwei Wu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou

Figure 1 for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 2 for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 3 for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Figure 4 for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Abstract:Multi-robot target tracking finds extensive applications in different scenarios, such as environmental surveillance and wildfire management, which require the robustness of the practical deployment of multi-robot systems in uncertain and dangerous environments. Traditional approaches often focus on the performance of tracking accuracy with no modeling and assumption of the environments, neglecting potential environmental hazards which result in system failures in real-world deployments. To address this challenge, we investigate multi-robot target tracking in the adversarial environment considering sensing and communication attacks with uncertainty. We design specific strategies to avoid different danger zones and proposed a multi-agent tracking framework under the perilous environment. We approximate the probabilistic constraints and formulate practical optimization strategies to address computational challenges efficiently. We evaluate the performance of our proposed methods in simulations to demonstrate the ability of robots to adjust their risk-aware behaviors under different levels of environmental uncertainty and risk confidence. The proposed method is further validated via real-world robot experiments where a team of drones successfully track dynamic ground robots while being risk-aware of the sensing and/or communication danger zones.

Via

Access Paper or Ask Questions

Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

Mar 16, 2024

Guangyao Shi, Gaurav S. Sukhatme

Figure 1 for Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

Figure 2 for Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

Figure 3 for Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

Figure 4 for Inverse Submodular Maximization with Application to Human-in-the-Loop Multi-Robot Multi-Objective Coverage Control

Abstract:We consider a new type of inverse combinatorial optimization, Inverse Submodular Maximization (ISM), for human-in-the-loop multi-robot coordination. Forward combinatorial optimization, defined as the process of solving a combinatorial problem given the reward (cost)-related parameters, is widely used in multi-robot coordination. In the standard pipeline, the reward (cost)-related parameters are designed offline by domain experts first and then these parameters are utilized for coordinating robots online. What if we need to change these parameters by non-expert human supervisors who watch over the robots during tasks to adapt to some new requirements? We are interested in the case where human supervisors can suggest what actions to take, and the robots need to change the internal parameters based on such suggestions. We study such problems from the perspective of inverse combinatorial optimization, i.e., the process of finding parameters given solutions to the problem. Specifically, we propose a new formulation for ISM, in which we aim to find a new set of parameters that minimally deviate from the current parameters and can make the greedy algorithm output actions the same as those suggested by humans. We show that such problems can be formulated as a Mixed Integer Quadratic Program (MIQP). However, MIQP involves exponentially many binary variables, making it intractable for the existing solver when the problem size is large. We propose a new algorithm under the Branch $\&$ Bound paradigm to solve such problems. In numerical simulations, we demonstrate how to use ISM in multi-robot multi-objective coverage control, and we show that the proposed algorithm achieves significant advantages in running time and peak memory usage compared to directly using an existing solver.

* submitted to IROS2024

Via

Access Paper or Ask Questions