Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peng Ren

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

Jun 07, 2026

Haoyang Ge, Peng Ren, Yukun Shi, Cong Huang, Kun Li, Kai Chen

Abstract:Humanoid robots require whole-body motions that adapt to scene context, task requirements, and user intent. Motion tracking reproduces specified trajectories, and humanoid vision-language-action systems provide semantic interfaces, but neither offers a scalable and interactive prior for broad full-body behavior. We introduce EgoPriMo (Egocentric Motion Prior for Humanoid Robots), a unified framework that learns such priors from egocentric human demonstrations. Given egocentric observations and a text prompt, EgoPriMo reconstructs, generates, and forecasts SMPL-based full-body motion. Language is used as a high-level control signal rather than a complete motion specification. At the core of EgoPriMo is a Triple-stream DiT that jointly models body dynamics, egocentric visual context, and text; task-conditioning masks route different tasks and missing-modality data through the same checkpoint. Experiments on Nymeria and EgoExo4D show that one checkpoint improves egocentric motion generation over UniEgoMotion while supporting reconstruction and forecasting; the generated SMPL motions can also be executed by a Unitree humanoid controller. These results indicate a practical path from scalable egocentric observations to generalizable and interactive humanoid motion priors.

Via

Access Paper or Ask Questions

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

Mar 15, 2026

Peng Ren, Chuan Qi, Haoyang Ge, Qiyuan Su, Xuguo He, Cong Huang, Pei Chi, Jiang Zhao, Kai Chen

Abstract:Dynamic ball-interaction tasks remain challenging for robots because they require tight perception-action coupling under limited reaction time. This challenge is especially pronounced in humanoid racket sports, where successful interception depends on accurate visual tracking, trajectory prediction, coordinated stepping, and stable whole-body striking. Existing robotic racket-sport systems often rely on external motion capture for state estimation or on task-specific low-level controllers that must be retrained across tasks and platforms. We present CyboRacket, a hierarchical perception-to-action framework for humanoid racket sports that integrates onboard visual perception, physics-based trajectory prediction, and large-scale pre-trained whole-body control. The framework uses onboard cameras to track the incoming object, predicts its future trajectory, and converts the estimated interception state into target end-effector and base-motion commands for whole-body execution by SONIC on the Unitree G1 humanoid robot. We evaluate the proposed framework in a vision-based humanoid tennis-hitting task. Experimental results demonstrate real-time visual tracking, trajectory prediction, and successful striking using purely onboard sensing.

Via

Access Paper or Ask Questions

Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation

Mar 11, 2026

Peng Ren, Haoyang Ge, Chuan Qi, Cong Huang, Hong Li, Jiang Zhao, Pei Chi, Kai Chen

Abstract:Robots are increasingly expected to execute open ended natural language requests in human environments, which demands reliable long horizon execution under partial observability. This is especially challenging for humanoids because locomotion and manipulation are tightly coupled through stance, reachability, and balance. We present a humanoid agent framework that turns VLM plans into verifiable task programs and closes the loop with multi object 3D geometric supervision. A VLM planner compiles each instruction into a typed JSON sequence of subtasks with explicit predicate based preconditions and success conditions. Using SAM3 and RGB-D, we ground all task relevant entities in 3D, estimate object centroids and extents, and evaluate predicates over stable frames to obtain condition level diagnostics. The supervisor uses these diagnostics to verify subtask completion and to provide condition-level feedback for progression and replanning. We execute each subtask by coordinating humanoid locomotion and whole-body manipulation, selecting feasible motion primitives under reachability and balance constraints. Experiments on tabletop manipulation and long horizon humanoid loco manipulation tasks show improved robustness from multi object grounding, temporal stability, and recovery driven replanning.

Via

Access Paper or Ask Questions

OilSAM2: Memory-Augmented SAM2 for Scalable SAR Oil Spill Detection

Mar 10, 2026

Shuaiyu Chen, Ming Yin, Peng Ren, Chunbo Luo, Zeyu Fu

Abstract:Segmenting oil spills from Synthetic Aperture Radar (SAR) imagery remains challenging due to severe appearance variability, scale heterogeneity, and the absence of temporal continuity in real world monitoring scenarios. While foundation models such as Segment Anything (SAM) enable prompt driven segmentation, existing SAM based approaches operate on single images and cannot effectively reuse information across scenes. Memory augmented variants (e.g., SAM2) further assume temporal coherence, making them prone to semantic drift when applied to unordered SAR image collections. We propose OilSAM2, a memory augmented segmentation framework tailored for unordered SAR oil spill monitoring. OilSAM2 introduces a hierarchical feature aware multi scale memory bank that explicitly models texture, structure, and semantic level representations, enabling robust cross image information reuse. To mitigate memory drift, we further propose a structure semantic consistent memory update strategy that selectively refreshes memory based on semantic discrepancy and structural variation.Experiments on two public SAR oil spill datasets demonstrate that OilSAM2 achieves state of the art segmentation performance, delivering stable and accurate results under noisy SAR monitoring scenarios. The source code is available at https://github.com/Chenshuaiyu1120/OILSAM2.

Via

Access Paper or Ask Questions

Accurate Pedestrian Tracking in Urban Canyons: A Multi-Modal Fusion Approach

Jan 29, 2026

Shahar Dubiner, Peng Ren, Roberto Manduchi

Abstract:The contribution describes a pedestrian navigation approach designed to improve localization accuracy in urban environments where GNSS performance is degraded, a problem that is especially critical for blind or low-vision users who depend on precise guidance such as identifying the correct side of a street. To address GNSS limitations and the impracticality of camera-based visual positioning, the work proposes a particle filter based fusion of GNSS and inertial data that incorporates spatial priors from maps, such as impassable buildings and unlikely walking areas, functioning as a probabilistic form of map matching. Inertial localization is provided by the RoNIN machine learning method, and fusion with GNSS is achieved by weighting particles based on their consistency with GNSS estimates and uncertainty. The system was evaluated on six challenging walking routes in downtown San Francisco using three metrics related to sidewalk correctness and localization error. Results show that the fused approach (GNSS+RoNIN+PF) significantly outperforms GNSS only localization on most metrics, while inertial-only localization with particle filtering also surpasses GNSS alone for critical measures such as sidewalk assignment and across street error.

Via

Access Paper or Ask Questions

Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Mar 26, 2024

Shuyu Chang, Rui Wang, Peng Ren, Haiping Huang

Figure 1 for Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Figure 2 for Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Figure 3 for Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Figure 4 for Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Abstract:Crafting effective topic models for brief texts, like tweets and news headlines, is essential for capturing the swift shifts in social dynamics. Traditional topic models, however, often fall short in accurately representing the semantic intricacies of short texts due to their brevity and lack of contextual data. In our study, we harness the advanced capabilities of Large Language Models (LLMs) to introduce a novel approach termed "Topic Refinement". This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined. By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically. This method emulates human-like scrutiny and improvement of topics, thereby elevating the semantic quality of the topics generated by various models. Our comprehensive evaluation across three unique datasets has shown that our topic refinement approach significantly enhances the semantic coherence of topics.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions

SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

Apr 17, 2023

Fang Chen, Heiko Balzter, Peng Ren, Huiyu Zhou

Figure 1 for SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

Figure 2 for SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

Figure 3 for SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

Figure 4 for SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation

Abstract:Effective oil spill segmentation in Synthetic Aperture Radar (SAR) images is critical for marine oil pollution cleanup, and proper image representation is helpful for accurate image segmentation. In this paper, we propose an effective oil spill image segmentation network named SRCNet by leveraging SAR image representation and the training for oil spill segmentation simultaneously. Specifically, our proposed segmentation network is constructed with a pair of deep neural nets with the collaboration of the seminal representation that describes SAR images, where one deep neural net is the generative net which strives to produce oil spill segmentation maps, and the other is the discriminative net which trys its best to distinguish between the produced and the true segmentations, and they thus built a two-player game. Particularly, the seminal representation exploited in our proposed SRCNet originates from SAR imagery, modelling with the internal characteristics of SAR images. Thus, in the training process, the collaborated seminal representation empowers the mapped generative net to produce accurate oil spill segmentation maps efficiently with small amount of training data, promoting the discriminative net reaching its optimal solution at a fast speed. Therefore, our proposed SRCNet operates effective oil spill segmentation in an economical and efficient manner. Additionally, to increase the segmentation capability of the proposed segmentation network in terms of accurately delineating oil spill details in SAR images, a regularisation term that penalises the segmentation loss is devised. This encourages our proposed SRCNet for accurately segmenting oil spill areas from SAR images. Empirical experimental evaluations from different metrics validate the effectiveness of our proposed SRCNet for oil spill image segmentation.

* arXiv admin note: substantial text overlap with arXiv:2301.01202

Via

Access Paper or Ask Questions

DGNet: Distribution Guided Efficient Learning for Oil Spill Image Segmentation

Dec 19, 2022

Fang Chen, Heiko Balzter, Feixiang Zhou, Peng Ren, Huiyu Zhou

Figure 1 for DGNet: Distribution Guided Efficient Learning for Oil Spill Image Segmentation

Figure 2 for DGNet: Distribution Guided Efficient Learning for Oil Spill Image Segmentation

Figure 3 for DGNet: Distribution Guided Efficient Learning for Oil Spill Image Segmentation

Figure 4 for DGNet: Distribution Guided Efficient Learning for Oil Spill Image Segmentation

Abstract:Successful implementation of oil spill segmentation in Synthetic Aperture Radar (SAR) images is vital for marine environmental protection. In this paper, we develop an effective segmentation framework named DGNet, which performs oil spill segmentation by incorporating the intrinsic distribution of backscatter values in SAR images. Specifically, our proposed segmentation network is constructed with two deep neural modules running in an interactive manner, where one is the inference module to achieve latent feature variable inference from SAR images, and the other is the generative module to produce oil spill segmentation maps by drawing the latent feature variables as inputs. Thus, to yield accurate segmentation, we take into account the intrinsic distribution of backscatter values in SAR images and embed it in our segmentation model. The intrinsic distribution originates from SAR imagery, describing the physical characteristics of oil spills. In the training process, the formulated intrinsic distribution guides efficient learning of optimal latent feature variable inference for oil spill segmentation. The efficient learning enables the training of our proposed DGNet with a small amount of image data. This is economically beneficial to oil spill segmentation where the availability of oil spill SAR image data is limited in practice. Additionally, benefiting from optimal latent feature variable inference, our proposed DGNet performs accurate oil spill segmentation. We evaluate the segmentation performance of our proposed DGNet with different metrics, and experimental evaluations demonstrate its effective segmentations.

Via

Access Paper or Ask Questions

Representation Learning of Knowledge Graph for Wireless Communication Networks

Aug 22, 2022

Shiwen He, Yeyu Ou, Liangpeng Wang, Hang Zhan, Peng Ren, Yongming Huang

Figure 1 for Representation Learning of Knowledge Graph for Wireless Communication Networks

Figure 2 for Representation Learning of Knowledge Graph for Wireless Communication Networks

Figure 3 for Representation Learning of Knowledge Graph for Wireless Communication Networks

Figure 4 for Representation Learning of Knowledge Graph for Wireless Communication Networks

Abstract:With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific problem based on a large amount of data generated by the Monte Carlo simulations. This article aims to understand the endogenous relationship of wireless data by constructing a knowledge graph according to the wireless communication protocols, and domain expert knowledge and further investigating the wireless endogenous intelligence. We firstly construct a knowledge graph of the endogenous factors of wireless core network data collected via a 5G/B5G testing network. Then, a novel model based on graph convolutional neural networks is designed to learn the representation of the graph, which is used to classify graph nodes and simulate the relation prediction. The proposed model realizes the automatic nodes classification and network anomaly cause tracing. It is also applied to the public datasets in an unsupervised manner. Finally, the results show that the classification accuracy of the proposed model is better than the existing unsupervised graph neural network models, such as VGAE and ARVGE.

Via

Access Paper or Ask Questions

Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Jun 22, 2022

Zeyu Wang, Huiying Zhao, Peng Ren, Yuxi Zhou, Ming Sheng

Figure 1 for Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Figure 2 for Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Figure 3 for Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Figure 4 for Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Abstract:Sepsis is a leading cause of death in the ICU. It is a disease requiring complex interventions in a short period of time, but its optimal treatment strategy remains uncertain. Evidence suggests that the practices of currently used treatment strategies are problematic and may cause harm to patients. To address this decision problem, we propose a new medical decision model based on historical data to help clinicians recommend the best reference option for real-time treatment. Our model combines offline reinforcement learning with deep reinforcement learning to address the problem that traditional reinforcement learning in healthcare cannot interact with the environment, enabling our model to make decisions in a continuous state-action space. We demonstrate that, on average, the treatments recommended by the model are more valuable and reliable than those recommended by clinicians. In a large validation dataset, we found that patients whose actual doses from clinicians matched the AI's decisions had the lowest mortality rates. Our model provides personalized, clinically interpretable treatment decisions for sepsis that can improve patient care.

Via

Access Paper or Ask Questions