Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Pan

VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

Jun 16, 2025

Jiaming Chen, Yiyu Jiang, Aoshen Huang, Yang Li, Wei Pan

Abstract:Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion Network (SFDNet) employs a dual-encoder-decoder Siamese architecture to embed two target objects into a shared latent space, while a diffusion-based conditioning process-conditioned by task instructions-generates two-stream object-centric motion flows that guide dual-arm coordination. We further design a dynamic task assignment strategy that seamlessly maps the predicted 2D motion flows into 3D space and incorporates a pre-trained vision-language model (VLM) to adaptively assign the optimal motion to each robotic arm over time. Experiments validate the effectiveness of the proposed method, demonstrating its ability to generalize to diverse manipulation tasks while maintaining high efficiency and adaptability. The code and demo videos are publicly available on our project website https://sites.google.com/view/vlm-sfd/.

Via

Access Paper or Ask Questions

From Grunts to Grammar: Emergent Language from Cooperative Foraging

May 19, 2025

Maytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao, Wei Pan, Mingfei Sun, Cheston Tan, Mengmi Zhang

Abstract:Early cavemen relied on gestures, vocalizations, and simple signals to coordinate, plan, avoid predators, and share resources. Today, humans collaborate using complex languages to achieve remarkable results. What drives this evolution in communication? How does language emerge, adapt, and become vital for teamwork? Understanding the origins of language remains a challenge. A leading hypothesis in linguistics and anthropology posits that language evolved to meet the ecological and social demands of early human cooperation. Language did not arise in isolation, but through shared survival goals. Inspired by this view, we investigate the emergence of language in multi-agent Foraging Games. These environments are designed to reflect the cognitive and ecological constraints believed to have influenced the evolution of communication. Agents operate in a shared grid world with only partial knowledge about other agents and the environment, and must coordinate to complete games like picking up high-value targets or executing temporally ordered actions. Using end-to-end deep reinforcement learning, agents learn both actions and communication strategies from scratch. We find that agents develop communication protocols with hallmark features of natural language: arbitrariness, interchangeability, displacement, cultural transmission, and compositionality. We quantify each property and analyze how different factors, such as population size and temporal dependencies, shape specific aspects of the emergent language. Our framework serves as a platform for studying how language can evolve from partial observability, temporal reasoning, and cooperative goals in embodied multi-agent settings. We will release all data, code, and models publicly.

Via

Access Paper or Ask Questions

Versatile, Robust, and Explosive Locomotion with Rigid and Articulated Compliant Quadrupeds

Apr 17, 2025

Jiatao Ding, Peiyu Yang, Fabio Boekel, Jens Kober, Wei Pan, Matteo Saveriano, Cosimo Della Santina

Abstract:Achieving versatile and explosive motion with robustness against dynamic uncertainties is a challenging task. Introducing parallel compliance in quadrupedal design is deemed to enhance locomotion performance, which, however, makes the control task even harder. This work aims to address this challenge by proposing a general template model and establishing an efficient motion planning and control pipeline. To start, we propose a reduced-order template model-the dual-legged actuated spring-loaded inverted pendulum with trunk rotation-which explicitly models parallel compliance by decoupling spring effects from active motor actuation. With this template model, versatile acrobatic motions, such as pronking, froggy jumping, and hop-turn, are generated by a dual-layer trajectory optimization, where the singularity-free body rotation representation is taken into consideration. Integrated with a linear singularity-free tracking controller, enhanced quadrupedal locomotion is achieved. Comparisons with the existing template model reveal the improved accuracy and generalization of our model. Hardware experiments with a rigid quadruped and a newly designed compliant quadruped demonstrate that i) the template model enables generating versatile dynamic motion; ii) parallel elasticity enhances explosive motion. For example, the maximal pronking distance, hop-turn yaw angle, and froggy jumping distance increase at least by 25%, 15% and 25%, respectively; iii) parallel elasticity improves the robustness against dynamic uncertainties, including modelling errors and external disturbances. For example, the allowable support surface height variation increases by 100% for robust froggy jumping.

* 20 pages, 25 figures

Via

Access Paper or Ask Questions

Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Apr 07, 2025

Jiaming Chen, Wentao Zhao, Ziyu Meng, Donghui Mao, Ran Song, Wei Pan, Wei Zhang

Figure 1 for Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Figure 2 for Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Figure 3 for Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Figure 4 for Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Abstract:Model Predictive Control (MPC) is a widely adopted control paradigm that leverages predictive models to estimate future system states and optimize control inputs accordingly. However, while MPC excels in planning and control, it lacks the capability for environmental perception, leading to failures in complex and unstructured scenarios. To address this limitation, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation planning framework that integrates the perception power of vision-language models (VLMs) with MPC. VLMPC utilizes a conditional action sampling module that takes a goal image or language instruction as input and leverages VLM to generate candidate action sequences. These candidates are fed into a video prediction model that simulates future frames based on the actions. In addition, we propose an enhanced variant, Traj-VLMPC, which replaces video prediction with motion trajectory generation to reduce computational complexity while maintaining accuracy. Traj-VLMPC estimates motion dynamics conditioned on the candidate actions, offering a more efficient alternative for long-horizon tasks and real-time applications. Both VLMPC and Traj-VLMPC select the optimal action sequence using a VLM-based hierarchical cost function that captures both pixel-level and knowledge-level consistency between the current observation and the task input. We demonstrate that both approaches outperform existing state-of-the-art methods on public benchmarks and achieve excellent performance in various real-world robotic manipulation tasks. Code is available at https://github.com/PPjmchen/VLMPC.

Via

Access Paper or Ask Questions

TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion

Mar 26, 2025

Amr Mousa, Neil Karavis, Michele Caprio, Wei Pan, Richard Allmendinger

Abstract:Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between the privileged teacher and the proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaptation lead to poor generalization in real-world scenarios. We propose Teacher-Aligned Representations via Contrastive Learning (TAR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "Teacher". Results showed accelerated training by 2x compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40 percent on average compared to existing methods. Additionally, TAR transitions seamlessly into learning during deployment without requiring privileged states, setting a new benchmark in sample-efficient, adaptive locomotion and enabling continual fine-tuning in real-world scenarios. Open-source code and videos are available at https://ammousa.github.io/TARLoco/.

* This work has been submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025 for review

Via

Access Paper or Ask Questions

Explosive Jumping with Rigid and Articulated Soft Quadrupeds via Example Guided Reinforcement Learning

Mar 20, 2025

Georgios Apostolides, Wei Pan, Jens Kober, Cosimo Della Santina, Jiatao Ding

Abstract:Achieving controlled jumping behaviour for a quadruped robot is a challenging task, especially when introducing passive compliance in mechanical design. This study addresses this challenge via imitation-based deep reinforcement learning with a progressive training process. To start, we learn the jumping skill by mimicking a coarse jumping example generated by model-based trajectory optimization. Subsequently, we generalize the learned policy to broader situations, including various distances in both forward and lateral directions, and then pursue robust jumping in unknown ground unevenness. In addition, without tuning the reward much, we learn the jumping policy for a quadruped with parallel elasticity. Results show that using the proposed method, i) the robot learns versatile jumps by learning only from a single demonstration, ii) the robot with parallel compliance reduces the landing error by 11.1%, saves energy cost by 15.2% and reduces the peak torque by 15.8%, compared to the rigid robot without parallel elasticity, iii) the robot can perform jumps of variable distances with robustness against ground unevenness (maximal 4cm height perturbations) using only proprioceptive perception.

* 8 pages, 9 figures, submitted to IROS2025

Via

Access Paper or Ask Questions

SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots

Jan 31, 2025

Chuhan Zhang, Cong Wang, Wei Pan, Cosimo Della Santina

Figure 1 for SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots

Figure 2 for SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots

Figure 3 for SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots

Figure 4 for SpikingSoft: A Spiking Neuron Controller for Bio-inspired Locomotion with Soft Snake Robots

Abstract:Inspired by the dynamic coupling of moto-neurons and physical elasticity in animals, this work explores the possibility of generating locomotion gaits by utilizing physical oscillations in a soft snake by means of a low-level spiking neural mechanism. To achieve this goal, we introduce the Double Threshold Spiking neuron model with adjustable thresholds to generate varied output patterns. This neuron model can excite the natural dynamics of soft robotic snakes, and it enables distinct movements, such as turning or moving forward, by simply altering the neural thresholds. Finally, we demonstrate that our approach, termed SpikingSoft, naturally pairs and integrates with reinforcement learning. The high-level agent only needs to adjust the two thresholds to generate complex movement patterns, thus strongly simplifying the learning of reactive locomotion. Simulation results demonstrate that the proposed architecture significantly enhances the performance of the soft snake robot, enabling it to achieve target objectives with a 21.6% increase in success rate, a 29% reduction in time to reach the target, and smoother movements compared to the vanilla reinforcement learning controllers or Central Pattern Generator controller acting in torque space.

* 8th IEEE-RAS International Conference on Soft Robotics

Via

Access Paper or Ask Questions

Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Dec 27, 2024

Xinglong Zhang, Wei Pan, Cong Li, Xin Xu, Xiangke Wang, Ronghua Zhang, Dewen Hu

Figure 1 for Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Figure 2 for Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Figure 3 for Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Figure 4 for Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Abstract:Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.

* 26 pages, 19 figures

Via

Access Paper or Ask Questions

LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Oct 15, 2024

Hossein Abdi, Mingfei Sun, Andi Zhang, Samuel Kaski, Wei Pan

Figure 1 for LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Figure 2 for LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Figure 3 for LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Figure 4 for LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Abstract:Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.

Via

Access Paper or Ask Questions

Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces

Oct 10, 2024

Rishabh Dev Yadav, Swati Dantu, Wei Pan, Sihao Sun, Spandan Roy, Simone Baldi

Figure 1 for Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces

Figure 2 for Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces

Figure 3 for Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces

Figure 4 for Modular Adaptive Aerial Manipulation under Unknown Dynamic Coupling Forces

Abstract:Successful aerial manipulation largely depends on how effectively a controller can tackle the coupling dynamic forces between the aerial vehicle and the manipulator. However, this control problem has remained largely unsolved as the existing control approaches either require precise knowledge of the aerial vehicle/manipulator inertial couplings, or neglect the state-dependent uncertainties especially arising during the interaction phase. This work proposes an adaptive control solution to overcome this long standing control challenge without any a priori knowledge of the coupling dynamic terms. Additionally, in contrast to the existing adaptive control solutions, the proposed control framework is modular, that is, it allows independent tuning of the adaptive gains for the vehicle position sub-dynamics, the vehicle attitude sub-dynamics, and the manipulator sub-dynamics. Stability of the closed loop under the proposed scheme is derived analytically, and real-time experiments validate the effectiveness of the proposed scheme over the state-of-the-art approaches.

* IEEE/ASME Transactions on Mechatronics, 2024

Via

Access Paper or Ask Questions