Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Diwen Liu

Change-Robust Online Spatial-Semantic Topological Mapping

May 04, 2026

Jiaming Wang, Jizhuo Chen, Diwen Liu, Atharva Ghotavadekar, Jiaxuan Da, Linh Kästner, Harold Soh

Abstract:Autonomous robots require change-robust spatial-semantic reasoning: using spatial and semantic knowledge to decide where to go, how to get there, and where the robot is despite environmental change. Existing approaches typically attach semantics to SLAM-built metric maps, but these pipelines are brittle under appearance shifts and scene dynamics, where data association and relocalization degrade. We propose a Change-Robust Online Spatial-Semantic (CROSS) representation that replaces a globally consistent metric substrate with an online, pose-aware topological graph of RGB-D keyframes. The system explicitly reasons over perceptual ambiguity using sequential hypothesis testing in continuous SE(3). Our estimator maintains a bounded Gaussian-mixture belief over poses, enabling principled handling of loop closures and kidnapped-robot events. Experiments under severe appearance change, including real-robot object-goal navigation with lighting shifts and furniture rearrangement, demonstrate improved robustness over SLAM-based and topological baselines while remaining safe under perceptual aliasing.

Via

Access Paper or Ask Questions

Training-free Task-oriented Grasp Generation

Feb 07, 2025

Jiaming Wang, Jizhuo Chen, Diwen Liu

Figure 1 for Training-free Task-oriented Grasp Generation

Figure 2 for Training-free Task-oriented Grasp Generation

Figure 3 for Training-free Task-oriented Grasp Generation

Figure 4 for Training-free Task-oriented Grasp Generation

Abstract:This paper presents a training-free pipeline for task-oriented grasp generation that combines pre-trained grasp generation models with vision-language models (VLMs). Unlike traditional approaches that focus solely on stable grasps, our method incorporates task-specific requirements by leveraging the semantic reasoning capabilities of VLMs. We evaluate five querying strategies, each utilizing different visual representations of candidate grasps, and demonstrate significant improvements over a baseline method in both grasp success and task compliance rates, with absolute gains of up to 36.9% in overall success rate. Our results underline the potential of VLMs to enhance task-oriented manipulation, providing insights for future research in robotic grasping and human-robot interaction.

Via

Access Paper or Ask Questions

Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Dec 12, 2024

Gregory Kang Ruey Lau, Wenyang Hu, Diwen Liu, Jizhuo Chen, See-Kiong Ng, Bryan Kian Hsiang Low

Figure 1 for Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Figure 2 for Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Figure 3 for Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Figure 4 for Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Abstract:Large Language Models still encounter substantial challenges in reasoning tasks, especially for smaller models, which many users may be restricted to due to resource constraints (e.g. GPU memory restrictions). Inference-time methods to boost LLM performance, such as prompting methods to invoke certain reasoning pathways in responses, have been shown effective in past works, though they largely rely on sequential queries. The ensemble method, which consists of multiple constituent models running in parallel, is a promising approach to achieving better inference-time performance, especially given recent developments that enabled significant speed-ups in LLM batch inference. In this work, we propose a novel, training-free LLM ensemble framework where a single LLM model is fed an optimized, diverse set of prompts in parallel, effectively producing an ensemble at inference time to achieve performance improvement in reasoning tasks. We empirically demonstrate that our method leads to significant gains on math reasoning tasks, e.g., on MATH, where our ensemble consisting of a few small models (e.g., three Qwen2-MATH-1.5B-it models) can outperform a larger model (e.g., Qwen2-MATH-7B-it).

* Accepted to NeurIPS 2024 Workshop on Foundation Model Interventions (MINT)

Via

Access Paper or Ask Questions