Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xingyu Peng

OctoNav: Towards Generalist Embodied Navigation

Jun 11, 2025

Chen Gao, Liankai Jin, Xingyu Peng, Jiazhao Zhang, Yue Deng, Annan Li, He Wang, Si Liu

Abstract:Embodied navigation stands as a foundation pillar within the broader pursuit of embodied AI. However, previous navigation research is divided into different tasks/capabilities, e.g., ObjNav, ImgNav and VLN, where they differ in task objectives and modalities, making datasets and methods are designed individually. In this work, we take steps toward generalist navigation agents, which can follow free-form instructions that include arbitrary compounds of multi-modal and multi-capability. To achieve this, we propose a large-scale benchmark and corresponding method, termed OctoNav-Bench and OctoNav-R1. Specifically, OctoNav-Bench features continuous environments and is constructed via a designed annotation pipeline. We thoroughly craft instruction-trajectory pairs, where instructions are diverse in free-form with arbitrary modality and capability. Also, we construct a Think-Before-Action (TBA-CoT) dataset within OctoNav-Bench to provide the thinking process behind actions. For OctoNav-R1, we build it upon MLLMs and adapt it to a VLA-type model, which can produce low-level actions solely based on 2D visual observations. Moreover, we design a Hybrid Training Paradigm (HTP) that consists of three stages, i.e., Action-/TBA-SFT, Nav-GPRO, and Online RL stages. Each stage contains specifically designed learning policies and rewards. Importantly, for TBA-SFT and Nav-GRPO designs, we are inspired by the OpenAI-o1 and DeepSeek-R1, which show impressive reasoning ability via thinking-before-answer. Thus, we aim to investigate how to achieve thinking-before-action in the embodied navigation field, to improve model's reasoning ability toward generalists. Specifically, we propose TBA-SFT to utilize the TBA-CoT dataset to fine-tune the model as a cold-start phrase and then leverage Nav-GPRO to improve its thinking ability. Finally, OctoNav-R1 shows superior performance compared with previous methods.

* 31 pages, 25 figures

Via

Access Paper or Ask Questions

Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving

May 20, 2025

Jingzheng Li, Tiancheng Wang, Xingyu Peng, Jiacheng Chen, Zhijun Chen, Bing Li, Xianglong Liu

Figure 1 for Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving

Figure 2 for Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving

Figure 3 for Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving

Figure 4 for Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving

Abstract:Autonomous Driving (AD) systems demand the high levels of safety assurance. Despite significant advancements in AD demonstrated on open-source benchmarks like Longest6 and Bench2Drive, existing datasets still lack regulatory-compliant scenario libraries for closed-loop testing to comprehensively evaluate the functional safety of AD. Meanwhile, real-world AD accidents are underrepresented in current driving datasets. This scarcity leads to inadequate evaluation of AD performance, posing risks to safety validation and practical deployment. To address these challenges, we propose Safety2Drive, a safety-critical scenario library designed to evaluate AD systems. Safety2Drive offers three key contributions. (1) Safety2Drive comprehensively covers the test items required by standard regulations and contains 70 AD function test items. (2) Safety2Drive supports the safety-critical scenario generalization. It has the ability to inject safety threats such as natural environment corruptions and adversarial attacks cross camera and LiDAR sensors. (3) Safety2Drive supports multi-dimensional evaluation. In addition to the evaluation of AD systems, it also supports the evaluation of various perception tasks, such as object detection and lane detection. Safety2Drive provides a paradigm from scenario construction to validation, establishing a standardized test framework for the safe deployment of AD.

Via

Access Paper or Ask Questions

Robust Beamforming Design for STAR-RIS Aided RSMA Network with Hardware Impairments

May 13, 2025

Ziyue Wang, Xiaoyan Ma, Xingyu Peng, Zheao Li, Jinyuan Liu, Yongliang Guan, Chau Yuen

Abstract:In this article, we investigate the robust beamforming design for a simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided downlink rate-splitting multiple access (RSMA) communication system, where both transceivers and STAR-RIS suffer from the impact of hardware impairments (HWI). A base station (BS) is deployed to transmit messages concurrently to multiple users, utilizing a STAR-RIS to improve communication quality and expand user coverage. We aim to maximize the achievable sum rate of the users while ensuring the constraints of transmit power, STAR-RIS coefficients, and the actual rate of the common stream for all users. To solve this challenging high-coupling and non-convexity problem, we adopt a fractional programming (FP)-based alternating optimization (AO) approach, where each sub-problem is addressed via semidefinite relaxation (SDR) and successive convex approximation (SCA) methods. Numerical results demonstrate that the proposed scheme outperforms other multiple access schemes and conventional passive RIS in terms of the achievable sum rate. Additionally, considering the HWI of the transceiver and STAR-RIS makes our algorithm more robust than when such considerations are not included.

Via

Access Paper or Ask Questions

GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Mar 26, 2025

Xingyu Peng, Si Liu, Chen Gao, Yan Bai, Beipeng Mu, Xiaofei Wang, Huaxia Xia

Figure 1 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Figure 2 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Figure 3 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Figure 4 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Abstract:The task of LiDAR-based 3D Open-Vocabulary Detection (3D OVD) requires the detector to learn to detect novel objects from point clouds without off-the-shelf training labels. Previous methods focus on the learning of object-level representations and ignore the scene-level information, thus it is hard to distinguish objects with similar classes. In this work, we propose a Global-Local Collaborative Reason and Debate with PSL (GLRD) framework for the 3D OVD task, considering both local object-level information and global scene-level information. Specifically, LLM is utilized to perform common sense reasoning based on object-level and scene-level information, where the detection result is refined accordingly. To further boost the LLM's ability of precise decisions, we also design a probabilistic soft logic solver (OV-PSL) to search for the optimal solution, and a debate scheme to confirm the class of confusable objects. In addition, to alleviate the uneven distribution of classes, a static balance scheme (SBC) and a dynamic balance scheme (DBC) are designed. In addition, to reduce the influence of noise in data and training, we further propose Reflected Pseudo Labels Generation (RPLG) and Background-Aware Object Localization (BAOL). Extensive experiments conducted on ScanNet and SUN RGB-D demonstrate the superiority of GLRD, where absolute improvements in mean average precision are $+2.82\%$ on SUN RGB-D and $+3.72\%$ on ScanNet in the partial open-vocabulary setting. In the full open-vocabulary setting, the absolute improvements in mean average precision are $+4.03\%$ on ScanNet and $+14.11\%$ on SUN RGB-D.

* 15 pages

Via

Access Paper or Ask Questions

Rumor Detection on Social Media with Temporal Propagation Structure Optimization

Dec 12, 2024

Xingyu Peng, Junran Wu, Ruomei Liu, Ke Xu

Figure 1 for Rumor Detection on Social Media with Temporal Propagation Structure Optimization

Figure 2 for Rumor Detection on Social Media with Temporal Propagation Structure Optimization

Figure 3 for Rumor Detection on Social Media with Temporal Propagation Structure Optimization

Figure 4 for Rumor Detection on Social Media with Temporal Propagation Structure Optimization

Abstract:Traditional methods for detecting rumors on social media primarily focus on analyzing textual content, often struggling to capture the complexity of online interactions. Recent research has shifted towards leveraging graph neural networks to model the hierarchical conversation structure that emerges during rumor propagation. However, these methods tend to overlook the temporal aspect of rumor propagation and may disregard potential noise within the propagation structure. In this paper, we propose a novel approach that incorporates temporal information by constructing a weighted propagation tree, where the weight of each edge represents the time interval between connected posts. Drawing upon the theory of structural entropy, we transform this tree into a coding tree. This transformation aims to preserve the essential structure of rumor propagation while reducing noise. Finally, we introduce a recursive neural network to learn from the coding tree for rumor veracity prediction. Experimental results on two common datasets demonstrate the superiority of our approach.

* COLING'25

Via

Access Paper or Ask Questions

Integrated Sensing and Communication in IRS-assisted High-Mobility Systems: Design, Analysis and Optimization

Jul 31, 2024

Xingyu Peng, Qin Tao, Xiaoling Hu, Richeng Jin, Chongwen Huang, Xiaoming Chen

Abstract:In this paper, we investigate integrated sensing and communication (ISAC) in high-mobility systems with the aid of an intelligent reflecting surface (IRS). To exploit the benefits of Delay-Doppler (DD) spread caused by high mobility, orthogonal time frequency space (OTFS)-based frame structure and transmission framework are proposed. {In such a framework,} we first design a low-complexity ratio-based sensing algorithm for estimating the velocity of mobile user. Then, we analyze the performance of sensing and communication in terms of achievable mean square error (MSE) and achievable rate, respectively, and reveal the impact of key parameters. Next, with the derived performance expressions, we jointly optimize the phase shift matrix of IRS and the receive combining vector at the base station (BS) to improve the overall performance of integrated sensing and communication. Finally, extensive simulation results confirm the effectiveness of the proposed algorithms in high-mobility systems.

* 15 pages, 9 figures

Via

Access Paper or Ask Questions

Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

Jul 12, 2024

Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

Abstract:Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS.

* accepted by ECCV 2024

Via

Access Paper or Ask Questions

Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Aug 31, 2023

Si Liu, Chen Gao, Yuan Chen, Xingyu Peng, Xianghao Kong, Kun Wang, Runsheng Xu, Wentao Jiang, Hao Xiang, Jiaqi Ma(+1 more)

Figure 1 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Figure 2 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Figure 3 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Figure 4 for Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception

Abstract:Vehicle-to-everything (V2X) autonomous driving opens up a promising direction for developing a new generation of intelligent transportation systems. Collaborative perception (CP) as an essential component to achieve V2X can overcome the inherent limitations of individual perception, including occlusion and long-range perception. In this survey, we provide a comprehensive review of CP methods for V2X scenarios, bringing a profound and in-depth understanding to the community. Specifically, we first introduce the architecture and workflow of typical V2X systems, which affords a broader perspective to understand the entire V2X system and the role of CP within it. Then, we thoroughly summarize and analyze existing V2X perception datasets and CP methods. Particularly, we introduce numerous CP methods from various crucial perspectives, including collaboration stages, roadside sensors placement, latency compensation, performance-bandwidth trade-off, attack/defense, pose alignment, etc. Moreover, we conduct extensive experimental analyses to compare and examine current CP methods, revealing some essential and unexplored insights. Specifically, we analyze the performance changes of different methods under different bandwidths, providing a deep insight into the performance-bandwidth trade-off issue. Also, we examine methods under different LiDAR ranges. To study the model robustness, we further investigate the effects of various simulated real-world noises on the performance of different CP methods, covering communication latency, lossy communication, localization errors, and mixed noises. In addition, we look into the sim-to-real generalization ability of existing CP methods. At last, we thoroughly discuss issues and challenges, highlighting promising directions for future efforts. Our codes for experimental analysis will be public at https://github.com/memberRE/Collaborative-Perception.

* 19 pages

Via

Access Paper or Ask Questions

Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach

Apr 22, 2023

Xingyu Peng, Zhenkun Zhou, Chong Zhang, Ke Xu

Figure 1 for Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach

Figure 2 for Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach

Figure 3 for Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach

Figure 4 for Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach

Abstract:Public opinion is a crucial factor in shaping political decision-making. Nowadays, social media has become an essential platform for individuals to engage in political discussions and express their political views, presenting researchers with an invaluable resource for analyzing public opinion. In this paper, we focus on the 2020 US presidential election and create a large-scale dataset from Twitter. To detect political opinions in tweets, we build a user-tweet bipartite graph based on users' posting and retweeting behaviors and convert the task into a Graph Neural Network (GNN)-based node classification problem. Then, we introduce a novel skip aggregation mechanism that makes tweet nodes aggregate information from second-order neighbors, which are also tweet nodes due to the graph's bipartite nature, effectively leveraging user behavioral information. The experimental results show that our proposed model significantly outperforms several competitive baselines. Further analyses demonstrate the significance of user behavioral information and the effectiveness of skip aggregation.

Via

Access Paper or Ask Questions

DoubleH: Twitter User Stance Detection via Bipartite Graph Neural Networks

Jan 20, 2023

Chong Zhang, Zhenkun Zhou, Xingyu Peng, Ke Xu

Abstract:Given the development and abundance of social media, studying the stance of social media users is a challenging and pressing issue. Social media users express their stance by posting tweets and retweeting. Therefore, the homogeneous relationship between users and the heterogeneous relationship between users and tweets are relevant for the stance detection task. Recently, graph neural networks (GNNs) have developed rapidly and have been applied to social media research. In this paper, we crawl a large-scale dataset of the 2020 US presidential election and automatically label all users by manually tagged hashtags. Subsequently, we propose a bipartite graph neural network model, DoubleH, which aims to better utilize homogeneous and heterogeneous information in user stance detection tasks. Specifically, we first construct a bipartite graph based on posting and retweeting relations for two kinds of nodes, including users and tweets. We then iteratively update the node's representation by extracting and separately processing heterogeneous and homogeneous information in the node's neighbors. Finally, the representations of user nodes are used for user stance classification. Experimental results show that DoubleH outperforms the state-of-the-art methods on popular benchmarks. Further analysis illustrates the model's utilization of information and demonstrates stability and efficiency at different numbers of layers.

Via

Access Paper or Ask Questions