Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinhu Zheng

Relaying Signal When Monitoring Traffic: Double Use of Aerial Vehicles Towards Intelligent Low-Altitude Networking

Dec 16, 2025

Jiahui Liang, Wenlihan Lu, Tianyi Liu, Kang Kang, Guixin Pan, Liuqing Yang, Xinhu Zheng, Shijian Gao

Abstract:In intelligent low-altitude networks, integrating monitoring tasks into communication unmanned aerial vehicles (UAVs) can consume resources and increase handoff latency for communication links. To address this challenge, we propose a strategy that enables a "double use" of UAVs, unifying the monitoring and relay handoff functions into a single, efficient process. Our scheme, guided by an integrated sensing and communication framework, coordinates these multi-role UAVs through a proactive handoff network that fuses multi-view sensory data from aerial and ground vehicles. A lightweight vehicle inspection module and a two-stage training procedure are developed to ensure monitoring accuracy and collaborative efficiency. Simulation results demonstrate the effectiveness of this integrated approach: it reduces communication outage probability by nearly 10% at a 200 Mbps requirement without compromising monitoring performance and maintains high resilience (86% achievable rate) even in the absence of multiple UAVs, outperforming traditional ground-based handoff schemes. Our code is available at the https://github.com/Jiahui-L/UAP.

Via

Access Paper or Ask Questions

An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Sep 26, 2025

Xiaoyun Qiu, Haichao Liu, Yue Pan, Jun Ma, Xinhu Zheng

Figure 1 for An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Figure 2 for An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Figure 3 for An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Figure 4 for An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Abstract:In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.

Via

Access Paper or Ask Questions

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Aug 17, 2025

Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian

Abstract:Machine anomalous sound detection (ASD) is a valuable technique across various applications. However, its generalization performance is often limited due to challenges in data collection and the complexity of acoustic environments. Inspired by the success of large pre-trained models in numerous fields, this paper introduces a robust ASD model that leverages self-supervised pre-trained models trained on large-scale speech and audio datasets. Although there are inconsistencies between the pre-training datasets and the ASD task, our findings indicate that pre-training still provides substantial benefits for ASD. To mitigate overfitting and retain learned knowledge when fine-tuning with limited data, we explore Fully-Connected Low-Rank Adaptation (LoRA) as an alternative to full fine-tuning. Additionally, we propose a Machine-aware Group Adapter module, which enables the model to capture differences between various machines within a unified framework, thereby enhancing the generalization performance of ASD systems. To address the challenge of missing attribute labels, we design a novel objective function that dynamically clusters unattributed data using vector quantization and optimizes through a dual-level contrastive learning loss. The proposed methods are evaluated on all benchmark datasets, including the DCASE 2020-2024 five ASD challenges, and the experimental results show significant improvements of our new approach and demonstrate the effectiveness of our proposed strategies.

* Accepted by TASLP. 15 pages, 7 figures;

Via

Access Paper or Ask Questions

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

Jul 22, 2025

Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian(+3 more)

Abstract:With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the RMIS benchmark, which evaluates the representations of M5 industrial signals on multiple health management tasks. Compared with top SSL models, FISHER showcases versatile and outstanding capabilities with a general performance gain up to 5.03%, along with much more efficient scaling curves. We also investigate the scaling law on downstream tasks and derive potential avenues for future works. FISHER is now open-sourced on https://github.com/jianganbai/FISHER

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Jul 09, 2025

Tianshun Li, Tianyi Huai, Zhen Li, Yichun Gao, Haoang Li, Xinhu Zheng

Figure 1 for SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Figure 2 for SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Figure 3 for SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Figure 4 for SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Abstract:Unmanned Aerial Vehicles (UAVs) have emerged as versatile tools across various sectors, driven by their mobility and adaptability. This paper introduces SkyVLN, a novel framework integrating vision-and-language navigation (VLN) with Nonlinear Model Predictive Control (NMPC) to enhance UAV autonomy in complex urban environments. Unlike traditional navigation methods, SkyVLN leverages Large Language Models (LLMs) to interpret natural language instructions and visual observations, enabling UAVs to navigate through dynamic 3D spaces with improved accuracy and robustness. We present a multimodal navigation agent equipped with a fine-grained spatial verbalizer and a history path memory mechanism. These components allow the UAV to disambiguate spatial contexts, handle ambiguous instructions, and backtrack when necessary. The framework also incorporates an NMPC module for dynamic obstacle avoidance, ensuring precise trajectory tracking and collision prevention. To validate our approach, we developed a high-fidelity 3D urban simulation environment using AirSim, featuring realistic imagery and dynamic urban elements. Extensive experiments demonstrate that SkyVLN significantly improves navigation success rates and efficiency, particularly in new and unseen environments.

* 8 pages, 9 figures, has been accepted by IROS 2025

Via

Access Paper or Ask Questions

RationalVLA: A Rational Vision-Language-Action Model with Dual System

Jun 12, 2025

Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge(+5 more)

Abstract:A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasible. To address this problem, we introduce RAtional MAnipulation (RAMA), a new benchmark that challenges models with both unseen executable instructions and defective ones that should be rejected. In RAMA, we construct a dataset with over 14,000 samples, including diverse defective instructions spanning six dimensions: visual, physical, semantic, motion, safety, and out-of-context. We further propose the Rational Vision-Language-Action model (RationalVLA). It is a dual system for robotic arms that integrates the high-level vision-language model with the low-level manipulation policy by introducing learnable latent space embeddings. This design enables RationalVLA to reason over instructions, reject infeasible commands, and execute manipulation effectively. Experiments demonstrate that RationalVLA outperforms state-of-the-art baselines on RAMA by a 14.5% higher success rate and 0.94 average task length, while maintaining competitive performance on standard manipulation tasks. Real-world trials further validate its effectiveness and robustness in practical applications. Our project page is https://irpn-eai.github.io/rationalvla.

* 14 pages

Via

Access Paper or Ask Questions

SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

May 21, 2025

Mengqi Lei, Yihong Wu, Siqi Li, Xinhu Zheng, Juan Wang, Yue Gao, Shaoyi Du

Abstract:Visual recognition relies on understanding both the semantics of image tokens and the complex interactions among them. Mainstream self-attention methods, while effective at modeling global pair-wise relations, fail to capture high-order associations inherent in real-world scenes and often suffer from redundant computation. Hypergraphs extend conventional graphs by modeling high-order interactions and offer a promising framework for addressing these limitations. However, existing hypergraph neural networks typically rely on static and hard hyperedge assignments, leading to excessive and redundant hyperedges with hard binary vertex memberships that overlook the continuity of visual semantics. To overcome these issues, we present Soft Hypergraph Neural Networks (SoftHGNNs), which extend the methodology of hypergraph computation, to make it truly efficient and versatile in visual recognition tasks. Our framework introduces the concept of soft hyperedges, where each vertex is associated with hyperedges via continuous participation weights rather than hard binary assignments. This dynamic and differentiable association is achieved by using the learnable hyperedge prototype. Through similarity measurements between token features and the prototype, the model generates semantically rich soft hyperedges. SoftHGNN then aggregates messages over soft hyperedges to capture high-order semantics. To further enhance efficiency when scaling up the number of soft hyperedges, we incorporate a sparse hyperedge selection mechanism that activates only the top-k important hyperedges, along with a load-balancing regularizer to ensure balanced hyperedge utilization. Experimental results across three tasks on five datasets demonstrate that SoftHGNN efficiently captures high-order associations in visual scenes, achieving significant performance improvements.

Via

Access Paper or Ask Questions

A Physics-informed End-to-End Occupancy Framework for Motion Planning of Autonomous Vehicles

May 08, 2025

Shuqi Shen, Junjie Yang, Hongliang Lu, Hui Zhong, Qiming Zhang, Xinhu Zheng

Abstract:Accurate and interpretable motion planning is essential for autonomous vehicles (AVs) navigating complex and uncertain environments. While recent end-to-end occupancy prediction methods have improved environmental understanding, they typically lack explicit physical constraints, limiting safety and generalization. In this paper, we propose a unified end-to-end framework that integrates verifiable physical rules into the occupancy learning process. Specifically, we embed artificial potential fields (APF) as physics-informed guidance during network training to ensure that predicted occupancy maps are both data-efficient and physically plausible. Our architecture combines convolutional and recurrent neural networks to capture spatial and temporal dependencies while preserving model flexibility. Experimental results demonstrate that our method improves task completion rate, safety margins, and planning efficiency across diverse driving scenarios, confirming its potential for reliable deployment in real-world AV systems.

Via

Access Paper or Ask Questions

Cross-cultural Deployment of Autonomous Vehicles Using Data-light Inverse Reinforcement Learning

Apr 15, 2025

Hongliang Lu, Shuqi Shen, Junjie Yang, Chao Lu, Xinhu Zheng, Hai Yang

Abstract:More than the adherence to specific traffic regulations, driving culture touches upon a more implicit part - an informal, conventional, collective behavioral pattern followed by drivers - that varies across countries, regions, and even cities. Such cultural divergence has become one of the biggest challenges in deploying autonomous vehicles (AVs) across diverse regions today. The current emergence of data-driven methods has shown a potential solution to enable culture-compatible driving through learning from data, but what if some underdeveloped regions cannot provide sufficient local data to inform driving culture? This issue is particularly significant for a broader global AV market. Here, we propose a cross-cultural deployment scheme for AVs, called data-light inverse reinforcement learning, designed to re-calibrate culture-specific AVs and assimilate them into other cultures. First, we report the divergence in driving cultures through a comprehensive comparative analysis of naturalistic driving datasets on highways from three countries: Germany, China, and the USA. Then, we demonstrate the effectiveness of our scheme by testing the expeditious cross-cultural deployment across these three countries, with cumulative testing mileage of over 56084 km. The performance is particularly advantageous when cross-cultural deployment is carried out without affluent local data. Results show that we can reduce the dependence on local data by a margin of 98.67% at best. This study is expected to bring a broader, fairer AV global market, particularly in those regions that lack enough local data to develop culture-compatible AVs.

Via

Access Paper or Ask Questions

Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing

Mar 26, 2025

Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang

Figure 1 for Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing

Figure 2 for Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing

Abstract:Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications. This paper introduces a multi-tier task offloading mechanism for MEC-enabled vehicular networks leveraging vehicle-to-everything (V2X) communications. The study focuses on applications with sequential subtasks and explores two tiers of collaboration. In the vehicle tier, we design a needing vehicle (NV)-helping vehicle (HV) matching scheme and inter-vehicle collaborative computation is studied, with joint optimization of task offloading decision, communication, and computation resource allocation to minimize energy consumption and meet latency requirements. In the roadside unit (RSU) tier, collaboration among RSUs is investigated to address multi-access issues of bandwidth and computation resources for multiple vehicles. A two-step method is proposed to solve the subchannel allocation problem. Detailed experiments are conducted to demonstrate the effectiveness of the proposed method and assess the impact of different parameters on system energy consumption.

Via

Access Paper or Ask Questions