Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongyu Yu

Master Rules from Chaos: Learning to Reason, Plan, and Interact from Chaos for Tangram Assembly

May 17, 2025

Chao Zhao, Chunli Jiang, Lifan Luo, Guanlan Zhang, Hongyu Yu, Michael Yu Wang, Qifeng Chen

Abstract:Tangram assembly, the art of human intelligence and manipulation dexterity, is a new challenge for robotics and reveals the limitations of state-of-the-arts. Here, we describe our initial exploration and highlight key problems in reasoning, planning, and manipulation for robotic tangram assembly. We present MRChaos (Master Rules from Chaos), a robust and general solution for learning assembly policies that can generalize to novel objects. In contrast to conventional methods based on prior geometric and kinematic models, MRChaos learns to assemble randomly generated objects through self-exploration in simulation without prior experience in assembling target objects. The reward signal is obtained from the visual observation change without manually designed models or annotations. MRChaos retains its robustness in assembling various novel tangram objects that have never been encountered during training, with only silhouette prompts. We show the potential of MRChaos in wider applications such as cutlery combinations. The presented work indicates that radical generalization in robotic assembly can be achieved by learning in much simpler domains.

* 7 pages, accepted by ICRA 2025

Via

Access Paper or Ask Questions

Learning thin deformable object manipulation with a multi-sensory integrated soft hand

Nov 21, 2024

Chao Zhao, Chunli Jiang, Lifan Luo, Shuai Yuan, Qifeng Chen, Hongyu Yu

Figure 1 for Learning thin deformable object manipulation with a multi-sensory integrated soft hand

Figure 2 for Learning thin deformable object manipulation with a multi-sensory integrated soft hand

Figure 3 for Learning thin deformable object manipulation with a multi-sensory integrated soft hand

Figure 4 for Learning thin deformable object manipulation with a multi-sensory integrated soft hand

Abstract:Robotic manipulation has made significant advancements, with systems demonstrating high precision and repeatability. However, this remarkable precision often fails to translate into efficient manipulation of thin deformable objects. Current robotic systems lack imprecise dexterity, the ability to perform dexterous manipulation through robust and adaptive behaviors that do not rely on precise control. This paper explores the singulation and grasping of thin, deformable objects. Here, we propose a novel solution that incorporates passive compliance, touch, and proprioception into thin, deformable object manipulation. Our system employs a soft, underactuated hand that provides passive compliance, facilitating adaptive and gentle interactions to dexterously manipulate deformable objects without requiring precise control. The tactile and force/torque sensors equipped on the hand, along with a depth camera, gather sensory data required for manipulation via the proposed slip module. The manipulation policies are learned directly from raw sensory data via model-free reinforcement learning, bypassing explicit environmental and object modeling. We implement a hierarchical double-loop learning process to enhance learning efficiency by decoupling the action space. Our method was deployed on real-world robots and trained in a self-supervised manner. The resulting policy was tested on a variety of challenging tasks that were beyond the capabilities of prior studies, ranging from displaying suit fabric like a salesperson to turning pages of sheet music for violinists.

* 19 pages

Via

Access Paper or Ask Questions

CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Dec 12, 2023

Lifan Luo, Boyang Zhang, Zhijie Peng, Yik Kin Cheung, Guanlan Zhang, Zhigang Li, Michael Yu Wang, Hongyu Yu

Figure 1 for CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Figure 2 for CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Figure 3 for CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Figure 4 for CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System

Abstract:As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that combines near-field 3D visual and tactile sensing. This sensor, with dimensions of 22$\times$14$\times$14 mm, leverages the compound eye imaging system to achieve a compact form factor without compromising its dual modalities. CompdVision utilizes two types of vision units to meet diverse sensing requirements. Stereo units with far-focus lenses can see through the transparent elastomer, facilitating depth estimation beyond the contact surface, while tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing. The sensor demonstrates effective depth estimation within a 70mm range from its surface. Additionally, it registers high accuracy in tangential and normal force measurements. The dual modalities and compact design make the sensor a versatile tool for complex robotic tasks.

Via

Access Paper or Ask Questions

Reconfigurable, Transformable Soft Pneumatic Actuator with Tunable 3D Deformations for Dexterous Soft Robotics Applications

Nov 06, 2023

Dickson Chiu Yu Wong, Mingtan Li, Shijie Kang, Lifan Luo, Hongyu Yu

Abstract:Numerous soft actuators based on PneuNet design have already been proposed and extensively employed across various soft robotics applications in recent years. Despite their widespread use, a common limitation of most existing designs is that their action is pre-determined during the fabrication process, thereby restricting the ability to modify or alter their function during operation. To address this shortcoming, in this article the design of a Reconfigurable, Transformable Soft Pneumatic Actuator (RT-SPA) is proposed. The working principle of the RT-SPA is analogous to the conventional PneuNet. The key distinction between the two lies in the ability of the RT-SPA to undergo controlled transformations, allowing for more versatile bending and twisting motions in various directions. Furthermore, the unique reconfigurable design of the RT-SPA enables the selection of actuation units with different sizes to achieve a diverse range of three-dimensional deformations. This versatility enhances the potential of the RT-SPA for adaptation to a multitude of tasks and environments, setting it apart from traditional PneuNet. The paper begins with a detailed description of the design and fabrication of the RT-SPA. Following this, a series of experiments are conducted to evaluate the performance of the RT-SPA. Finally, the abilities of the RT-SPA for locomotion, gripping, and object manipulation are demonstrated to illustrate the versatility of the RT-SPA across different aspects.

* Submitted to Soft Robotics Journal. 12 pages, 10 figures

Via

Access Paper or Ask Questions

FSDNet-An efficient fire detection network for complex scenarios based on YOLOv3 and DenseNet

Apr 15, 2023

Li Zhu, Jiahui Xiong, Wenxian Wu, Hongyu Yu

Figure 1 for FSDNet-An efficient fire detection network for complex scenarios based on YOLOv3 and DenseNet

Figure 2 for FSDNet-An efficient fire detection network for complex scenarios based on YOLOv3 and DenseNet

Figure 3 for FSDNet-An efficient fire detection network for complex scenarios based on YOLOv3 and DenseNet

Figure 4 for FSDNet-An efficient fire detection network for complex scenarios based on YOLOv3 and DenseNet

Abstract:Fire is one of the common disasters in daily life. To achieve fast and accurate detection of fires, this paper proposes a detection network called FSDNet (Fire Smoke Detection Network), which consists of a feature extraction module, a fire classification module, and a fire detection module. Firstly, a dense connection structure is introduced in the basic feature extraction module to enhance the feature extraction ability of the backbone network and alleviate the gradient disappearance problem. Secondly, a spatial pyramid pooling structure is introduced in the fire detection module, and the Mosaic data augmentation method and CIoU loss function are used in the training process to comprehensively improve the flame feature extraction ability. Finally, in view of the shortcomings of public fire datasets, a fire dataset called MS-FS (Multi-scene Fire And Smoke) containing 11938 fire images was created through data collection, screening, and object annotation. To prove the effectiveness of the proposed method, the accuracy of the method was evaluated on two benchmark fire datasets and MS-FS. The experimental results show that the accuracy of FSDNet on the two benchmark datasets is 99.82% and 91.15%, respectively, and the average precision on MS-FS is 86.80%, which is better than the mainstream fire detection methods.

Via

Access Paper or Ask Questions

ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks

Apr 05, 2023

Chao Zhao, Shuai Yuan, Chunli Jiang, Junhao Cai, Hongyu Yu, Michael Yu Wang, Qifeng Chen

Abstract:This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a large language model, which infers action language from natural language instruction and environment state. The robot then zooms to the fine-resolution inference part to perform the concrete action corresponding to the action language. Fine-resolution inference is constructed as a Markov decision process, which takes action language and environmental sensing as observations and outputs the action. The results of action execution in environments provide feedback for subsequent coarse-resolution reasoning. Such coarse-to-fine inference allows the robot to decompose and achieve long-horizon tasks interactively. In extensive experiments, we show that ERRA can complete various long-horizon manipulation tasks specified by abstract language instructions. We also demonstrate successful generalization to the novel but similar natural language instructions.

* Accepted to IEEE Robotics and Automation Letters (RA-L)

Via

Access Paper or Ask Questions

Flipbot: Learning Continuous Paper Flipping via Coarse-to-Fine Exteroceptive-Proprioceptive Exploration

Apr 05, 2023

Chao Zhao, Chunli Jiang, Junhao Cai, Michael Yu Wang, Hongyu Yu, Qifeng Chen

Abstract:This paper tackles the task of singulating and grasping paper-like deformable objects. We refer to such tasks as paper-flipping. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we present Flipbot, a novel solution for flipping paper-like deformable objects. Flipbot allows the robot to capture object physical properties by integrating exteroceptive and proprioceptive perceptions that are indispensable for manipulating deformable objects. Furthermore, by incorporating a proposed coarse-to-fine exploration process, the system is capable of learning the optimal control parameters for effective paper-flipping through proprioceptive and exteroceptive inputs. We deploy our method on a real-world robot with a soft gripper and learn in a self-supervised manner. The resulting policy demonstrates the effectiveness of Flipbot on paper-flipping tasks with various settings beyond the reach of prior studies, including but not limited to flipping pages throughout a book and emptying paper sheets in a box.

* Accepted to International Conference on Robotics and Automation (ICRA) 2023

Via

Access Paper or Ask Questions

Learn to Grasp via Intention Discovery and its Application to Challenging Clutter

Apr 05, 2023

Chao Zhao, Chunli Jiang, Junhao Cai, Hongyu Yu, Michael Yu Wang, Qifeng Chen

Figure 1 for Learn to Grasp via Intention Discovery and its Application to Challenging Clutter

Figure 2 for Learn to Grasp via Intention Discovery and its Application to Challenging Clutter

Figure 3 for Learn to Grasp via Intention Discovery and its Application to Challenging Clutter

Figure 4 for Learn to Grasp via Intention Discovery and its Application to Challenging Clutter

Abstract:Humans excel in grasping objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust grasping policies through self-exploration. The resulting policy can grasp challenging objects in various environments with an off-the-shelf parallel gripper. The key component is a learned intention estimator, which maps gripper pose and visual sensory to a set of sub-intents covering important phases of the grasping movement. Sub-intents can be used to build an intrinsic reward to guide policy learning. The learned policy demonstrates remarkable zero-shot generalization from simulation to the real world while retaining its robustness against states that have never been encountered during training, novel objects such as protractors and user manuals, and environments such as the cluttered conveyor.

* IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 488-495, Feb. 2023
* Accepted to IEEE Robotics and Automation Letters (RA-L)

Via

Access Paper or Ask Questions

Capturing long-range interaction with reciprocal space neural network

Nov 30, 2022

Hongyu Yu, Liangliang Hong, Shiyou Chen, Xingao Gong, Hongjun Xiang

Abstract:Machine Learning (ML) interatomic models and potentials have been widely employed in simulations of materials. Long-range interactions often dominate in some ionic systems whose dynamics behavior is significantly influenced. However, the long-range effect such as Coulomb and Van der Wales potential is not considered in most ML interatomic potentials. To address this issue, we put forward a method that can take long-range effects into account for most ML local interatomic models with the reciprocal space neural network. The structure information in real space is firstly transformed into reciprocal space and then encoded into a reciprocal space potential or a global descriptor with full atomic interactions. The reciprocal space potential and descriptor keep full invariance of Euclidean symmetry and choice of the cell. Benefiting from the reciprocal-space information, ML interatomic models can be extended to describe the long-range potential including not only Coulomb but any other long-range interaction. A model NaCl system considering Coulomb interaction and the GaxNy system with defects are applied to illustrate the advantage of our approach. At the same time, our approach helps to improve the prediction accuracy of some global properties such as the band gap where the full atomic interaction beyond local atomic environments plays a very important role. In summary, our work has expanded the ability of current ML interatomic models and potentials when dealing with the long-range effect, hence paving a new way for accurate prediction of global properties and large-scale dynamic simulations of systems with defects.

* 15 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

Time-reversal equivariant neural network potential and Hamiltonian for magnetic materials

Nov 21, 2022

Hongyu Yu, Yang Zhong, Junyi Ji, Xingao Gong, Hongjun Xiang

Abstract:This work presents Time-reversal Equivariant Neural Network (TENN) framework. With TENN, the time-reversal symmetry is considered in the equivariant neural network (ENN), which generalizes the ENN to consider physical quantities related to time-reversal symmetry such as spin and velocity of atoms. TENN-e3, as the time-reversal-extension of E(3) equivariant neural network, is developed to keep the Time-reversal E(3) equivariant with consideration of whether to include the spin-orbit effect for both collinear and non-collinear magnetic moments situations for magnetic material. TENN-e3 can construct spin neural network potential and the Hamiltonian of magnetic material from ab-initio calculations. Time-reversal-E(3)-equivariant convolutions for interactions of spinor and geometric tensors are employed in TENN-e3. Compared to the popular ENN, TENN-e3 can describe the complex spin-lattice coupling with high accuracy and keep time-reversal symmetry which is not preserved in the existing E(3)-equivariant model. Also, the Hamiltonian of magnetic material with time-reversal symmetry can be built with TENN-e3. TENN paves a new way to spin-lattice dynamics simulations over long-time scales and electronic structure calculations of large-scale magnetic materials.

* 15 pages,2 figures and 2 tables

Via

Access Paper or Ask Questions