Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenzhen Yuan

University of Illinois Urbana-Champaign, USA

Learning to Double Guess: An Active Perception Approach for Estimating the Center of Mass of Arbitrary Objects

Feb 04, 2025

Shengmiao Jin, Yuchen Mo, Wenzhen Yuan

Abstract:Manipulating arbitrary objects in unstructured environments is a significant challenge in robotics, primarily due to difficulties in determining an object's center of mass. This paper introduces U-GRAPH: Uncertainty-Guided Rotational Active Perception with Haptics, a novel framework to enhance the center of mass estimation using active perception. Traditional methods often rely on single interaction and are limited by the inherent inaccuracies of Force-Torque (F/T) sensors. Our approach circumvents these limitations by integrating a Bayesian Neural Network (BNN) to quantify uncertainty and guide the robotic system through multiple, information-rich interactions via grid search and a neural network that scores each action. We demonstrate the remarkable generalizability and transferability of our method with training on a small dataset with limited variation yet still perform well on unseen complex real-world objects.

* Accepted to ICRA 25; 7 pages, 5 figures

Via

Access Paper or Ask Questions

GelBelt: A Vision-based Tactile Sensor for Continuous Sensing of Large Surfaces

Jan 09, 2025

Mohammad Amin Mirzaee, Hung-Jui Huang, Wenzhen Yuan

Abstract:Scanning large-scale surfaces is widely demanded in surface reconstruction applications and detecting defects in industries' quality control and maintenance stages. Traditional vision-based tactile sensors have shown promising performance in high-resolution shape reconstruction while suffering limitations such as small sensing areas or susceptibility to damage when slid across surfaces, making them unsuitable for continuous sensing on large surfaces. To address these shortcomings, we introduce a novel vision-based tactile sensor designed for continuous surface sensing applications. Our design uses an elastomeric belt and two wheels to continuously scan the target surface. The proposed sensor showed promising results in both shape reconstruction and surface fusion, indicating its applicability. The dot product of the estimated and reference surface normal map is reported over the sensing area and for different scanning speeds. Results indicate that the proposed sensor can rapidly scan large-scale surfaces with high accuracy at speeds up to 45 mm/s.

* Accepted to IEEE RA-L. 8 pages, 7 figures, webpage: https://aminmirz.github.io/GelBelt/

Via

Access Paper or Ask Questions

NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors

Dec 12, 2024

Hung-Jui Huang, Michael Kaess, Wenzhen Yuan

Abstract:Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D reconstruction. In this work, we introduce NormalFlow, a fast, robust, and real-time tactile-based 6DoF tracking algorithm. Leveraging the precise surface normal estimation of vision-based tactile sensors, NormalFlow determines object movements by minimizing discrepancies between the tactile-derived surface normals. Our results show that NormalFlow consistently outperforms competitive baselines and can track low-texture objects like table surfaces. For long-horizon tracking, we demonstrate when rolling the sensor around a bead for 360 degrees, NormalFlow maintains a rotational tracking error of 2.5 degrees. Additionally, we present state-of-the-art tactile-based 3D reconstruction results, showcasing the high accuracy of NormalFlow. We believe NormalFlow unlocks new possibilities for high-precision perception and manipulation tasks that involve interacting with objects using hands. The video demo, code, and dataset are available on our website: https://joehjhuang.github.io/normalflow.

* IEEE Robotics and Automation Letters ( Volume: 10, Issue: 1, January 2025)
* 8 pages, published in 2024 RA-L, website link: https://joehjhuang.github.io/normalflow

Via

Access Paper or Ask Questions

Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Dec 09, 2024

Ruihan Gao, Kangle Deng, Gengshan Yang, Wenzhen Yuan, Jun-Yan Zhu

Figure 1 for Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Figure 2 for Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Figure 3 for Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Figure 4 for Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation

Abstract:3D generation methods have shown visually compelling results powered by diffusion image priors. However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by 2D diffusion model priors on both visual and tactile domains. We condition the visual texture generation on high-resolution tactile normals and guide the patch-based tactile texture refinement with a customized TextureDreambooth. We further present a multi-part generation pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method in both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.

* Accepted to NeurIPS 2024. Project webpage: https://ruihangao.github.io/TactileDreamFusion/ Code: https://github.com/RuihanGao/TactileDreamFusion

Via

Access Paper or Ask Questions

FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Oct 10, 2024

Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

Figure 1 for FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Figure 2 for FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Figure 3 for FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Figure 4 for FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Abstract:Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.

Via

Access Paper or Ask Questions

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Sep 24, 2024

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin, Ning Liao, Zhiyu Li, Feiyu Xiong, Ting Liu, Yuzhuo Fu

Figure 1 for MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Figure 2 for MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Figure 3 for MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Figure 4 for MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Abstract:Large visual-language models (LVLMs) have achieved great success in multiple applications. However, they still encounter challenges in complex scenes, especially those involving camouflaged objects. This is primarily due to the lack of samples related to camouflaged scenes in the training dataset. To mitigate this issue, we construct the MM-CamObj dataset for the first time, comprising two subsets: CamObj-Align and CamObj-Instruct. Specifically, CamObj-Align contains 11,363 image-text pairs, and it is designed for VL alignment and injecting rich knowledge of camouflaged scenes into LVLMs. CamObj-Instruct is collected for fine-tuning the LVLMs with improved instruction-following capabilities, and it includes 11,363 images and 68,849 conversations with diverse instructions. Based on the MM-CamObj dataset, we propose the CamObj-Llava, an LVLM specifically designed for addressing tasks in camouflaged scenes. To facilitate our model's effective acquisition of knowledge about camouflaged objects and scenes, we introduce a curriculum learning strategy with six distinct modes. Additionally, we construct the CamObj-Bench to evaluate the existing LVLMs' capabilities of understanding, recognition, localization and count in camouflage scenes. This benchmark includes 600 images and 7 tasks, with a total of 9,449 questions. Extensive experiments are conducted on the CamObj-Bench with CamObj-Llava, 8 existing open-source and 3 closed-source LVLMs. Surprisingly, the results indicate that our model achieves a 25.84% improvement in 4 out of 7 tasks compared to GPT-4o. Code and datasets will be available at https://github.com/JCruan519/MM-CamObj.

* 9 pages, 5 figures. Work in progress

Via

Access Paper or Ask Questions

An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

Jul 01, 2024

Xinyuan Luo, Shengmiao Jin, Hung-Jui Huang, Wenzhen Yuan

Figure 1 for An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

Figure 2 for An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

Figure 3 for An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

Figure 4 for An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

Abstract:Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrates Haptic Sensing and control algorithms to autonomously stir flour and water to achieve the desired batter uniformity, estimate the batter's properties such as the water-flour ratio and liquid level, as well as perform precise manipulations to pour the batter into any specified shape. Experimental results show the system's capability to always produce batter of desired uniformity, estimate water-flour ratio and liquid level precisely, and accurately pour it into complex shapes. This research showcases the potential for robots to assist in kitchens and step towards commercial culinary automation.

* 8 pages, 10 figures, Accepted to IROS 2024

Via

Access Paper or Ask Questions

Scalable, Simulation-Guided Compliant Tactile Finger Design

Mar 07, 2024

Yuxiang Ma, Arpit Agarwal, Sandra Q. Liu, Wenzhen Yuan, Edward H. Adelson

Figure 1 for Scalable, Simulation-Guided Compliant Tactile Finger Design

Figure 2 for Scalable, Simulation-Guided Compliant Tactile Finger Design

Figure 3 for Scalable, Simulation-Guided Compliant Tactile Finger Design

Figure 4 for Scalable, Simulation-Guided Compliant Tactile Finger Design

Abstract:Compliant grippers enable robots to work with humans in unstructured environments. In general, these grippers can improve with tactile sensing to estimate the state of objects around them to precisely manipulate objects. However, co-designing compliant structures with high-resolution tactile sensing is a challenging task. We propose a simulation framework for the end-to-end forward design of GelSight Fin Ray sensors. Our simulation framework consists of mechanical simulation using the finite element method (FEM) and optical simulation including physically based rendering (PBR). To simulate the fluorescent paint used in these GelSight Fin Rays, we propose an efficient method that can be directly integrated in PBR. Using the simulation framework, we investigate design choices available in the compliant grippers, namely gel pad shapes, illumination conditions, Fin Ray gripper sizes, and Fin Ray stiffness. This infrastructure enables faster design and prototype time frames of new Fin Ray sensors that have various sensing areas, ranging from 48 mm $\times$ \18 mm to 70 mm $\times$ 35 mm. Given the parameters we choose, we can thus optimize different Fin Ray designs and show their utility in grasping day-to-day objects.

* Yuxiang Ma, Arpit Agarwal, and Sandra Q. Liu contributed equally to this work. Project video: https://youtu.be/CnTUTA5cfMw . 7 pages, 11 figures, 2024 IEEE International Conference on Soft Robotics (RoboSoft)

Via

Access Paper or Ask Questions

Kitchen Artist: Precise Control of Liquid Dispensing for Gourmet Plating

Nov 20, 2023

Hung-Jui Huang, Jingyi Xiang, Wenzhen Yuan

Abstract:Manipulating liquid is widely required for many tasks, especially in cooking. A common way to address this is extruding viscous liquid from a squeeze bottle. In this work, our goal is to create a sauce plating robot, which requires precise control of the thickness of squeezed liquids on a surface. Different liquids demand different manipulation policies. We command the robot to tilt the container and monitor the liquid response using a force sensor to identify liquid properties. Based on the liquid properties, we predict the liquid behavior with fixed squeezing motions in a data-driven way and calculate the required drawing speed for the desired stroke size. This open-loop system works effectively even without sensor feedback. Our experiments demonstrate accurate stroke size control across different liquids and fill levels. We show that understanding liquid properties can facilitate effective liquid manipulation. More importantly, our dish garnishing robot has a wide range of applications and holds significant commercialization potential.

* Submitted to ICRA 2024

Via

Access Paper or Ask Questions

Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Sep 08, 2023

Arpit Agarwal, Abhiroop Ajith, Chengtao Wen, Veniamin Stryzheus, Brian Miller, Matthew Chen, Micah K. Johnson, Jose Luis Susa Rincon, Justinian Rosca, Wenzhen Yuan

Figure 1 for Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Figure 2 for Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Figure 3 for Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Figure 4 for Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components

Abstract:In manufacturing processes, surface inspection is a key requirement for quality assessment and damage localization. Due to this, automated surface anomaly detection has become a promising area of research in various industrial inspection systems. A particular challenge in industries with large-scale components, like aircraft and heavy machinery, is inspecting large parts with very small defect dimensions. Moreover, these parts can be of curved shapes. To address this challenge, we present a 2-stage multi-modal inspection pipeline with visual and tactile sensing. Our approach combines the best of both visual and tactile sensing by identifying and localizing defects using a global view (vision) and using the localized area for tactile scanning for identifying remaining defects. To benchmark our approach, we propose a novel real-world dataset with multiple metallic defect types per image, collected in the production environments on real aerospace manufacturing parts, as well as online robot experiments in two environments. Our approach is able to identify 85% defects using Stage I and identify 100% defects after Stage II. The dataset is publicly available at https://zenodo.org/record/8327713

* This is a pre-print for International Conference on Intelligent Robots and Systems 2023 publication

Via

Access Paper or Ask Questions