Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shimian Zhang

Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

Apr 29, 2024

Shimian Zhang, Qiuhong Lu

Figure 1 for Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

Figure 2 for Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

Figure 3 for Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

Figure 4 for Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform

Abstract:In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-effector ensures continuous object tracking, significantly mitigating environmental uncertainties. By deploying on a mobile platform, our grasping system has an enhanced mobility, playing a key role in dynamic environments where adaptability are critical. This synthesis enables dynamic object segmentation, tracking, and grasping. It also elevates user interaction, allowing the robot to intuitively respond to various modalities such as clicks, drawings, or voice commands, beyond traditional robotic systems. Empirical assessments in both simulated and real-world demonstrate the system's capabilities. This configuration opens avenues for wide-ranging applications, from industrial settings, agriculture, and household tasks, to specialized assignments and beyond.

Via

Access Paper or Ask Questions

Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots

Jul 20, 2023

Shimian Zhang

Figure 1 for Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots

Figure 2 for Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots

Figure 3 for Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots

Abstract:As the advent of artificial general intelligence (AGI) progresses at a breathtaking pace, the application of large language models (LLMs) as AI Agents in robotics remains in its nascent stage. A significant concern that hampers the seamless integration of these AI Agents into robotics is the unpredictability of the content they generate, a phenomena known as ``hallucination''. Drawing inspiration from biological neural systems, we propose a novel, layered architecture for autonomous robotics, bridging AI agent intelligence and robot instinct. In this context, we define Robot Instinct as the innate or learned set of responses and priorities in an autonomous robotic system that ensures survival-essential tasks, such as safety assurance and obstacle avoidance, are carried out in a timely and effective manner. This paradigm harmoniously combines the intelligence of LLMs with the instinct of robotic behaviors, contributing to a more safe and versatile autonomous robotic system. As a case study, we illustrate this paradigm within the context of a mobile robot, demonstrating its potential to significantly enhance autonomous robotics and enabling a future where robots can operate independently and safely across diverse environments.

Via

Access Paper or Ask Questions

Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Oct 14, 2022

Shimian Zhang, Skanda Bharadwaj, Keaton Kraiger, Yashasvi Asthana, Hong Zhang, Robert Collins, Yanxi Liu

Figure 1 for Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Figure 2 for Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Figure 3 for Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Figure 4 for Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Abstract:We demonstrate the utility of recurring pattern discovery from a single image for spatial understanding of a 3D scene in terms of (1) vanishing point detection, (2) hypothesizing 3D translation symmetry and (3) counting the number of RP instances in the image. Furthermore, we illustrate the feasibility of leveraging RP discovery output to form a more precise, quantitative text description of the scene. Our quantitative evaluations on a new 1K+ Recurring Pattern (RP) benchmark with diverse variations show that visual perception of recurrence from one single view leads to scene understanding outcomes that are as good as or better than existing supervised methods and/or unsupervised methods that use millions of images.

Via

Access Paper or Ask Questions