Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bui Quang Huy

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

Mar 27, 2025

Alan Dao, Dinh Bach Vu, Bui Quang Huy

Abstract:This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height information through structured tokens, enabling precise spatial reasoning without relying on traditional vision-based embeddings. This approach enables LLMs to accurately manipulate objects by positioning them at specific (x, y, z) coordinates. Experimental results suggest that AlphaSpace demonstrates promising potential for improving manipulation tasks, achieving a total accuracy of 66.67%, compared to 37.5% for GPT-4o and 29.17% for Claude 3.5 Sonnet. These results demonstrate the potential of structured spatial encoding for manipulation tasks and warrant further exploration.

Via

Access Paper or Ask Questions

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

Mar 11, 2025

Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy

Figure 1 for PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

Figure 2 for PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

Abstract:This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By projecting visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.

Via

Access Paper or Ask Questions