Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Zhou

CRIBS, LTSI

Behavior-Aware Online Prediction of Obstacle Occupancy using Zonotopes

Oct 23, 2025

Alvaro Carrizosa-Rendon, Jian Zhou, Erik Frisk, Vicenc Puig, Fatiha Nejjari

Abstract:Predicting the motion of surrounding vehicles is key to safe autonomous driving, especially in unstructured environments without prior information. This paper proposes a novel online method to accurately predict the occupancy sets of surrounding vehicles based solely on motion observations. The approach is divided into two stages: first, an Extended Kalman Filter and a Linear Programming (LP) problem are used to estimate a compact zonotopic set of control actions; then, a reachability analysis propagates this set to predict future occupancy. The effectiveness of the method has been validated through simulations in an urban environment, showing accurate and compact predictions without relying on prior assumptions or prior training data.

* 64th IEEE Conference on Decision and Control

Via

Access Paper or Ask Questions

Linguistic Neuron Overlap Patterns to Facilitate Cross-lingual Transfer on Low-resource Languages

Aug 23, 2025

Yuemei Xu, Kexin Xu, Jian Zhou, Ling Hu, Lin Gui

Abstract:The current Large Language Models (LLMs) face significant challenges in improving performance on low-resource languages and urgently need data-efficient methods without costly fine-tuning. From the perspective of language-bridge, we propose BridgeX-ICL, a simple yet effective method to improve zero-shot Cross-lingual In-Context Learning (X-ICL) for low-resource languages. Unlike existing works focusing on language-specific neurons, BridgeX-ICL explores whether sharing neurons can improve cross-lingual performance in LLMs or not. We construct neuron probe data from the ground-truth MUSE bilingual dictionaries, and define a subset of language overlap neurons accordingly, to ensure full activation of these anchored neurons. Subsequently, we propose an HSIC-based metric to quantify LLMs' internal linguistic spectrum based on overlap neurons, which guides optimal bridge selection. The experiments conducted on 2 cross-lingual tasks and 15 language pairs from 7 diverse families (covering both high-low and moderate-low pairs) validate the effectiveness of BridgeX-ICL and offer empirical insights into the underlying multilingual mechanisms of LLMs.

Via

Access Paper or Ask Questions

MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

May 21, 2025

Lu Li, Cunhang Fan, Hongyu Zhang, Jingjing Zhang, Xiaoke Yang, Jian Zhou, Zhao Lv

Figure 1 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 2 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 3 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Figure 4 for MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

Abstract:Auditory attention detection (AAD) aims to detect the target speaker in a multi-talker environment from brain signals, such as electroencephalography (EEG), which has made great progress. However, most AAD methods solely utilize attention mechanisms sequentially and overlook valuable multi-scale contextual information within EEG signals, limiting their ability to capture long-short range spatiotemporal dependencies simultaneously. To address these issues, this paper proposes a multi-scale hybrid attention network (MHANet) for AAD, which consists of the multi-scale hybrid attention (MHA) module and the spatiotemporal convolution (STC) module. Specifically, MHA combines channel attention and multi-scale temporal and global attention mechanisms. This effectively extracts multi-scale temporal patterns within EEG signals and captures long-short range spatiotemporal dependencies simultaneously. To further improve the performance of AAD, STC utilizes temporal and spatial convolutions to aggregate expressive spatiotemporal representations. Experimental results show that the proposed MHANet achieves state-of-the-art performance with fewer trainable parameters across three datasets, 3 times lower than that of the most advanced model. Code is available at: https://github.com/fchest/MHANet.

Via

Access Paper or Ask Questions

ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

May 15, 2025

Cunhang Fan, Xiaoke Yang, Hongyu Zhang, Ying Chen, Lu Li, Jian Zhou, Zhao Lv

Figure 1 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 2 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 3 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Figure 4 for ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Abstract:Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spatio-Temporal Enhancement Nested Network (ListenNet) for AAD. The ListenNet has three key components: Spatio-temporal Dependency Encoder (STDE), Multi-scale Temporal Enhancement (MSTE), and Cross-Nested Attention (CNA). The STDE reconstructs dependencies between consecutive time windows across channels, improving the robustness of dynamic pattern extraction. The MSTE captures temporal features at multiple scales to represent both fine-grained and long-range temporal patterns. In addition, the CNA integrates hierarchical features more effectively through novel dynamic attention mechanisms to capture deep spatio-temporal correlations. Experimental results on three public datasets demonstrate the superiority of ListenNet over state-of-the-art methods in both subject-dependent and challenging subject-independent settings, while reducing the trainable parameter count by approximately 7 times. Code is available at:https://github.com/fchest/ListenNet.

Via

Access Paper or Ask Questions

SparseMeXT Unlocking the Potential of Sparse Representations for HD Map Construction

May 12, 2025

Anqing Jiang, Jinhao Chai, Yu Gao, Yiru Wang, Yuwen Heng, Zhigang Sun, Hao Sun, Zezhong Zhao, Li Sun, Jian Zhou(+3 more)

Abstract:Recent advancements in high-definition \emph{HD} map construction have demonstrated the effectiveness of dense representations, which heavily rely on computationally intensive bird's-eye view \emph{BEV} features. While sparse representations offer a more efficient alternative by avoiding dense BEV processing, existing methods often lag behind due to the lack of tailored designs. These limitations have hindered the competitiveness of sparse representations in online HD map construction. In this work, we systematically revisit and enhance sparse representation techniques, identifying key architectural and algorithmic improvements that bridge the gap with--and ultimately surpass--dense approaches. We introduce a dedicated network architecture optimized for sparse map feature extraction, a sparse-dense segmentation auxiliary task to better leverage geometric and semantic cues, and a denoising module guided by physical priors to refine predictions. Through these enhancements, our method achieves state-of-the-art performance on the nuScenes dataset, significantly advancing HD map construction and centerline detection. Specifically, SparseMeXt-Tiny reaches a mean average precision \emph{mAP} of 55.5% at 32 frames per second \emph{fps}, while SparseMeXt-Base attains 65.2% mAP. Scaling the backbone and decoder further, SparseMeXt-Large achieves an mAP of 68.9% at over 20 fps, establishing a new benchmark for sparse representations in HD map construction. These results underscore the untapped potential of sparse methods, challenging the conventional reliance on dense representations and redefining efficiency-performance trade-offs in the field.

Via

Access Paper or Ask Questions

Extracting Abstraction Dimensions by Identifying Syntax Pattern from Texts

Apr 26, 2025

Jian Zhou, Jiazheng Li, Sirui Zhuge, Hai Zhuge

Abstract:This paper proposed an approach to automatically discovering subject dimension, action dimension, object dimension and adverbial dimension from texts to efficiently operate texts and support query in natural language. The high quality of trees guarantees that all subjects, actions, objects and adverbials and their subclass relations within texts can be represented. The independency of trees ensures that there is no redundant representation between trees. The expressiveness of trees ensures that the majority of sentences can be accessed from each tree and the rest of sentences can be accessed from at least one tree so that the tree-based search mechanism can support querying in natural language. Experiments show that the average precision, recall and F1-score of the abstraction trees constructed by the subclass relations of subject, action, object and adverbial are all greater than 80%. The application of the proposed approach to supporting query in natural language demonstrates that different types of question patterns for querying subject or object have high coverage of texts, and searching multiple trees on subject, action, object and adverbial according to the question pattern can quickly reduce search space to locate target sentences, which can support precise operation on texts.

* 25pages, 3 figures, 8 tables

Via

Access Paper or Ask Questions

Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach

Feb 26, 2025

Yanmeng Wang, Wenkai Ji, Jian Zhou, Fu Xiao, Tsung-Hui Chang

Abstract:Federated learning (FL) has emerged as a promising distributed learning paradigm for training deep neural networks (DNNs) at the wireless edge, but its performance can be severely hindered by unreliable wireless transmission and inherent data heterogeneity among clients. Existing solutions primarily address these challenges by incorporating wireless resource optimization strategies, often focusing on uplink resource allocation across clients under the assumption of homogeneous client-server network standards. However, these approaches overlooked the fact that mobile clients may connect to the server via diverse network standards (e.g., 4G, 5G, Wi-Fi) with customized configurations, limiting the flexibility of server-side modifications and restricting applicability in real-world commercial networks. This paper presents a novel theoretical analysis about how transmission failures in unreliable networks distort the effective label distributions of local samples, causing deviations from the global data distribution and introducing convergence bias in FL. Our analysis reveals that a carefully designed client selection strategy can mitigate biases induced by network unreliability and data heterogeneity. Motivated by this insight, we propose FedCote, a client selection approach that optimizes client selection probabilities without relying on wireless resource scheduling. Experimental results demonstrate the robustness of FedCote in DNN-based classification tasks under unreliable networks with frequent transmission failures.

Via

Access Paper or Ask Questions

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Dec 26, 2024

Cunhang Fan, Enrui Liu, Andong Li, Jianhua Tao, Jian Zhou, Jiahao Li, Chengshi Zheng, Zhao Lv

Figure 1 for BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Figure 2 for BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Figure 3 for BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Figure 4 for BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Abstract:Although the complex spectrum-based speech enhancement(SE) methods have achieved significant performance, coupling amplitude and phase can lead to a compensation effect, where amplitude information is sacrificed to compensate for the phase that is harmful to SE. In addition, to further improve the performance of SE, many modules are stacked onto SE, resulting in increased model complexity that limits the application of SE. To address these problems, we proposed a dual-path network based on compressed frequency using Mamba. First, we extract amplitude and phase information through parallel dual branches. This approach leverages structured complex spectra to implicitly capture phase information and solves the compensation effect by decoupling amplitude and phase, and the network incorporates an interaction module to suppress unnecessary parts and recover missing components from the other branch. Second, to reduce network complexity, the network introduces a band-split strategy to compress the frequency dimension. To further reduce complexity while maintaining good performance, we designed a Mamba-based module that models the time and frequency dimensions under linear complexity. Finally, compared to baselines, our model achieves an average 8.3 times reduction in computational complexity while maintaining superior performance. Furthermore, it achieves a 25 times reduction in complexity compared to transformer-based models.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Uncertainty-Aware Decision-Making and Planning for Autonomous Forced Merging

Oct 27, 2024

Jian Zhou, Yulong Gao, Björn Olofsson, Erik Frisk

Abstract:In this paper, we develop an uncertainty-aware decision-making and motion-planning method for an autonomous ego vehicle in forced merging scenarios, considering the motion uncertainty of surrounding vehicles. The method dynamically captures the uncertainty of surrounding vehicles by online estimation of their acceleration bounds, enabling a reactive but rapid understanding of the uncertainty characteristics of the surrounding vehicles. By leveraging these estimated bounds, a non-conservative forward occupancy of surrounding vehicles is predicted over a horizon, which is incorporated in both the decision-making process and the motion-planning strategy, to enhance the resilience and safety of the planned reference trajectory. The method successfully fulfills the tasks in challenging forced merging scenarios, and the properties are illustrated by comparison with several alternative approaches.

* Accepted by the 63rd IEEE Conference on Decision and Control, 2024

Via

Access Paper or Ask Questions

Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

May 23, 2024

Guangyu Guo, Jiawen Yao, Yingda Xia, Tony C. W. Mok, Zhilin Zheng, Junwei Han, Le Lu, Dingwen Zhang, Jian Zhou, Ling Zhang

Figure 1 for Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

Figure 2 for Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

Figure 3 for Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

Figure 4 for Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

Abstract:The absence of adequately sufficient expert-level tumor annotations hinders the effectiveness of supervised learning based opportunistic cancer screening on medical imaging. Clinical reports (that are rich in descriptive textual details) can offer a "free lunch'' supervision information and provide tumor location as a type of weak label to cope with screening tasks, thus saving human labeling workloads, if properly leveraged. However, predicting cancer only using such weak labels can be very changeling since tumors are usually presented in small anatomical regions compared to the whole 3D medical scans. Weakly semi-supervised learning (WSSL) utilizes a limited set of voxel-level tumor annotations and incorporates alongside a substantial number of medical images that have only off-the-shelf clinical reports, which may strike a good balance between minimizing expert annotation workload and optimizing screening efficacy. In this paper, we propose a novel text-guided learning method to achieve highly accurate cancer detection results. Through integrating diagnostic and tumor location text prompts into the text encoder of a vision-language model (VLM), optimization of weakly supervised learning can be effectively performed in the latent space of VLM, thereby enhancing the stability of training. Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability, and produce reliable pseudo tumor masks to improve cancer detection. Our extensive quantitative experimental results on a large-scale cancer dataset, including 1,651 unique patients, validate that our approach can reduce human annotation efforts by at least 70% while maintaining comparable cancer detection accuracy to competing fully supervised methods (AUC value 0.961 versus 0.966).

Via

Access Paper or Ask Questions