Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiangpin Liu

RynnEC: Bringing MLLMs into Embodied World

Aug 19, 2025

Ronghao Dang, Yuqian Yuan, Yunxuan Mao, Kehan Li, Jiangpin Liu, Zhikai Wang, Xin Li, Fan Wang, Deli Zhao

Abstract:We introduce RynnEC, a video multimodal large language model designed for embodied cognition. Built upon a general-purpose vision-language foundation model, RynnEC incorporates a region encoder and a mask decoder, enabling flexible region-level video interaction. Despite its compact architecture, RynnEC achieves state-of-the-art performance in object property understanding, object segmentation, and spatial reasoning. Conceptually, it offers a region-centric video paradigm for the brain of embodied agents, providing fine-grained perception of the physical world and enabling more precise interactions. To mitigate the scarcity of annotated 3D datasets, we propose an egocentric video based pipeline for generating embodied cognition data. Furthermore, we introduce RynnEC-Bench, a region-centered benchmark for evaluating embodied cognitive capabilities. We anticipate that RynnEC will advance the development of general-purpose cognitive cores for embodied agents and facilitate generalization across diverse embodied tasks. The code, model checkpoints, and benchmark are available at: https://github.com/alibaba-damo-academy/RynnEC

* The technical report of RynnEC, an embodied cognition MLLM

Via

Access Paper or Ask Questions

Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Apr 09, 2025

Jonas Herzog, Jiangpin Liu, Yue Wang

Abstract:Recent robotic task planning frameworks have integrated large multimodal models (LMMs) such as GPT-4V. To address grounding issues of such models, it has been suggested to split the pipeline into perceptional state grounding and subsequent state-based planning. As we show in this work, the state grounding ability of LMM-based approaches is still limited by weaknesses in granular, structured, domain-specific scene understanding. To address this shortcoming, we develop a more structured state grounding framework that features a domain-conditioned scene graph as its scene representation. We show that such representation is actionable in nature as it is directly mappable to a symbolic state in classical planning languages such as PDDL. We provide an instantiation of our state grounding framework where the domain-conditioned scene graph generation is implemented with a lightweight vision-language approach that classifies domain-specific predicates on top of domain-relevant object detections. Evaluated across three domains, our approach achieves significantly higher state estimation accuracy and task planning success rates compared to the previous LMM-based approaches.

Via

Access Paper or Ask Questions

Sparse Waypoint Validity Checking for Self-Entanglement-Free Tethered Path Planning

Aug 30, 2023

Tong Yang, Jiangpin Liu, Yue Wang, Rong Xiong

Abstract:A novel mechanism to derive self-entanglement-free (SEF) path for tethered differential-driven robots is proposed in this work. The problem is tailored to the deployment of tethered differential-driven robots in situations where an omni-directional tether re-tractor is not available. This is frequently encountered when it is impractical to concurrently equip an omni-directional tether retracting mechanism with other geometrically intricate devices, such as a manipulator, which is notably relevant in applications like disaster recovery, spatial exploration, etc. Without specific attention to the spatial relation between the shape of the tether and the pose of the mobile unit, the issue of self-entanglement arises when the robot moves, resulting in unsafe robot movements and the risk of damaging the tether. In this paper, the SEF constraint is first formulated as the boundedness of a relative angle function which characterises the angular difference between the tether stretching direction and the robot's heading direction. Then, a constrained searching-based path planning algorithm is proposed which produces a path that is sub-optimal whilst ensuring the avoidance of tether self-entanglement. Finally, the algorithmic efficiency of the proposed path planner is further enhanced by proving the conditioned sparsity of the primitive path validity checking module. The effectiveness of the proposed algorithm is assessed through case studies, comparing its performance against untethered differential-driven planners in challenging planning scenarios. A comparative analysis is further conducted between the normal node expansion module and the improved node expansion module which incorporates sparse waypoint validity checking. Real-world tests are also conducted to validate the algorithm's performance. An open-source implementation has also made available for the benefit of the robotics community.

* This is a generalised version of the authors' ICRA23 conference paper

Via

Access Paper or Ask Questions