Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Heyu Guo

MuseVLA: An Adaptive Multimodal Sensing Vision-Language-Action Model for Robotic Manipulation

Jun 16, 2026

Xingyuming Liu, Ruichun Ma, Heyu Guo, Qixiu Li, Qingwen Yang, Lin Luo, Shiqi Jiang, Chenren Xu, Jiaolong Yang, Baining Guo

Abstract:Humans naturally leverage diverse sensing modalities to interact with the physical world, while most Vision-Language-Action (VLA) models for robotics rely solely on RGB observations. This limits their ability to perceive physical properties that are difficult or impossible to infer from RGB cameras, such as temperature, sound, or radar response. We present MuseVLA, an adaptive multimodal sensing VLA model that integrates novel sensors as on-demand tools for robotic manipulation. Given a task instruction and visual context, MuseVLA first generates a sensor token and target description that select the sensing modality to invoke and what to attend to, analogous to a tool call with arguments. It then converts the selected sensor measurement into a grounded sensor image, a unified intermediate representation that encodes heterogeneous readings for multimodal fusion and action generation. This design decouples sensor-specific processing from the VLA backbone, enabling efficient integration of diverse modalities. To reduce the need for expensive multisensory robot datasets, we further introduce a data synthesis pipeline that augments existing RGB video datasets with grounded sensor images, enabling generalization to unseen sensor-guided tasks. We evaluate MuseVLA on a real-world robot across challenging dexterous hand manipulation tasks that require multimodal sensing inputs, including temperature-guided pick-and-place, audio-driven object search, and radar-assisted hidden object retrieval. MuseVLA achieves 80.6% success rate on average, outperforming RGB-only and multisensory VLA baselines significantly, and exhibits strong zero-shot capabilities on unseen tasks.

Via

Access Paper or Ask Questions

Panoptic: True Joint mmWave Communication and Sensing with Compressive Sidelobe Forming

Apr 08, 2025

Heyu Guo, Ruiyi Shen, Florian Kosterhon, Yasaman Ghasempour

Abstract:The integration of communication and sensing functions within mmWave systems has gained attention due to the potential for enhanced passive sensing and improved communication reliability. State-of-the-art techniques separate these two functions in frequency, use of hardware, or time, i.e., sending known preambles for channel sensing or unknown symbols for communications. In this paper, we introduce Panoptic, a novel system architecture for integrated communication and sensing sharing the same hardware, frequency, and time resources. Panoptic jointly detects unknown symbols and channel components from data-modulated signals. The core idea is a new beam manipulation technique, which we call compressive sidelobe forming, that maintains a directional mainlobe toward the intended communication nodes while acquiring unique spatial information through pseudorandom sidelobe perturbations. We implemented Panoptic on 60 GHz mmWave radios and conducted extensive over-the-air experiments. Our results show that Panoptic achieves reflector angular localization error of less than 2\deg while at the same time supporting mmWave data communication with a negligible BER penalty when compared with conventional communication-only mmWave systems.

* Submitted on IEEE Journal on Selected Areas in Communications

Via

Access Paper or Ask Questions

Radarize: Large-Scale Radar SLAM for Indoor Environments

Nov 19, 2023

Emerson Sie, Xinyu Wu, Heyu Guo, Deepak Vasisht

Figure 1 for Radarize: Large-Scale Radar SLAM for Indoor Environments

Figure 2 for Radarize: Large-Scale Radar SLAM for Indoor Environments

Figure 3 for Radarize: Large-Scale Radar SLAM for Indoor Environments

Figure 4 for Radarize: Large-Scale Radar SLAM for Indoor Environments

Abstract:We present Radarize, a self-contained SLAM pipeline for indoor environments that uses only a low-cost commodity single-chip mmWave radar. Our radar-native approach leverages phenomena unique to radio frequencies, such as doppler shift-based odometry, to improve performance. We evaluate our method on a large-scale dataset of 146 trajectories spanning 4 campus buildings, totaling approximately 4680m of travel distance. Our results show that our method outperforms state-of-the-art radar-based approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM, as measured by absolute trajectory error (ATE), without the need additional sensors such as IMUs or wheel odometry.

Via

Access Paper or Ask Questions

RF-CHORD: Towards Deployable RFID Localization System for Logistics Network

Nov 01, 2022

Bo Liang, Purui Wang, Renjie Zhao, Heyu Guo, Pengyu Zhang, Junchen Guo, Shunmin Zhu, Hongqiang Harry Liu, Xinyu Zhang, Chenren Xu

Abstract:RFID localization is considered the key enabler of automating the process of inventory tracking and management for high-performance logistic network. A practical and deployable RFID localization system needs to meet reliability, throughput, and range requirements. This paper presents RF-Chord, the first RFID localization system that simultaneously meets all three requirements. RF-Chord features a one-shot multisine-constructed wideband design that can process RF signal with a 200 MHz bandwidth in real-time to facilitate one-shot localization at scale. In addition, multiple SINR enhancement techniques are designed for range extension. On top of that, a kernel-layer-based near-field localization framework and a multipath-suppression algorithm are proposed to reduce the 99% long-tail errors. Our empirical results show that RF-Chord can localize more than 180 tags 6 m away from a reader within 1 second and with 99% long-tail error of 0.786 m, achieving a 0% miss reading rate and ~0.01% cross-reading rate in the warehouse and fresh food delivery store deployment.

* To be published in NSDI 2023

Via

Access Paper or Ask Questions