Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT

Aug 12, 2025

Muhammad A. Muttaqien, Tomohiro Motoda, Ryo Hanai, Yukiyasu Domae

Share this with someone who'll enjoy it:

Abstract:Robotic pick-and-place tasks in convenience stores pose challenges due to dense object arrangements, occlusions, and variations in object properties such as color, shape, size, and texture. These factors complicate trajectory planning and grasping. This paper introduces a perception-action pipeline leveraging annotation-guided visual prompting, where bounding box annotations identify both pickable objects and placement locations, providing structured spatial guidance. Instead of traditional step-by-step planning, we employ Action Chunking with Transformers (ACT) as an imitation learning algorithm, enabling the robotic arm to predict chunked action sequences from human demonstrations. This facilitates smooth, adaptive, and data-driven pick-and-place operations. We evaluate our system based on success rate and visual analysis of grasping behavior, demonstrating improved grasp accuracy and adaptability in retail environments.

View paper on

Share this with someone who'll enjoy it:

Title:Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT

Paper and Code