Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shanshan Zhang

MambaIO: Global-Coordinate Inertial Odometry for Pedestrians via Multi-Scale Frequency-Decoupled Modeling

Nov 19, 2025

Shanshan Zhang

Abstract:Inertial Odometry (IO) enables real-time localization using only acceleration and angular velocity measurements from an Inertial Measurement Unit (IMU), making it a promising solution for localization in consumer-grade applications. Traditionally, IMU measurements in IO have been processed under two coordinate system paradigms: the body coordinate frame and the global coordinate frame, with the latter being widely adopted. However, recent studies in drone scenarios have demonstrated that the body frame can significantly improve localization accuracy, prompting a re-evaluation of the suitability of the global frame for pedestrian IO. To address this issue, this paper systematically evaluates the effectiveness of the global coordinate frame in pedestrian IO through theoretical analysis, qualitative inspection, and quantitative experiments. Building upon these findings, we further propose MambaIO, which decomposes IMU measurements into high-frequency and low-frequency components using a Laplacian pyramid. The low-frequency component is processed by a Mamba architecture to extract implicit contextual motion cues, while the high-frequency component is handled by a convolutional structure to capture fine-grained local motion details. Experiments on multiple public datasets show that MambaIO substantially reduces localization error and achieves state-of-the-art (SOTA) performance. To the best of our knowledge, this is the first application of the Mamba architecture to the inertial odometry task.

Via

Access Paper or Ask Questions

IONext: Unlocking the Next Era of Inertial Odometry

Jul 23, 2025

Shanshan Zhang, Siyue Wang, Tianshui Wen, Qi Zhang, Ziheng Zhou, Lingxiang Zheng, Yu Yang

Abstract:Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.

Via

Access Paper or Ask Questions

DepthFusion: Depth-Aware Hybrid Feature Fusion for LiDAR-Camera 3D Object Detection

May 12, 2025

Mingqian Ji, Jian Yang, Shanshan Zhang

Abstract:State-of-the-art LiDAR-camera 3D object detectors usually focus on feature fusion. However, they neglect the factor of depth while designing the fusion strategy. In this work, we are the first to observe that different modalities play different roles as depth varies via statistical analysis and visualization. Based on this finding, we propose a Depth-Aware Hybrid Feature Fusion (DepthFusion) strategy that guides the weights of point cloud and RGB image modalities by introducing depth encoding at both global and local levels. Specifically, the Depth-GFusion module adaptively adjusts the weights of image Bird's-Eye-View (BEV) features in multi-modal global features via depth encoding. Furthermore, to compensate for the information lost when transferring raw features to the BEV space, we propose a Depth-LFusion module, which adaptively adjusts the weights of original voxel features and multi-view image features in multi-modal local features via depth encoding. Extensive experiments on the nuScenes and KITTI datasets demonstrate that our DepthFusion method surpasses previous state-of-the-art methods. Moreover, our DepthFusion is more robust to various kinds of corruptions, outperforming previous methods on the nuScenes-C dataset.

Via

Access Paper or Ask Questions

Safe Flow Matching: Robot Motion Planning with Control Barrier Functions

Apr 11, 2025

Xiaobing Dai, Dian Yu, Shanshan Zhang, Zewen Yang

Abstract:Recent advances in generative modeling have led to promising results in robot motion planning, particularly through diffusion and flow-based models that capture complex, multimodal trajectory distributions. However, these methods are typically trained offline and remain limited when faced with unseen environments or dynamic constraints, often lacking explicit mechanisms to ensure safety during deployment. In this work, we propose, Safe Flow Matching (SafeFM), a motion planning approach for trajectory generation that integrates flow matching with safety guarantees. By incorporating the proposed flow matching barrier functions, SafeFM ensures that generated trajectories remain within safe regions throughout the planning horizon, even in the presence of previously unseen obstacles or state-action constraints. Unlike diffusion-based approaches, our method allows for direct, efficient sampling of constraint-satisfying trajectories, making it well-suited for real-time motion planning. We evaluate SafeFM on a diverse set of tasks, including planar robot navigation and 7-DoF manipulation, demonstrating superior safety, generalization, and planning performance compared to state-of-the-art generative planners. Comprehensive resources are available on the project website: https://safeflowmatching.github.io/SafeFM/

Via

Access Paper or Ask Questions

SoccerNet 2024 Challenges Results

Sep 16, 2024

Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk(+74 more)

Figure 1 for SoccerNet 2024 Challenges Results

Figure 2 for SoccerNet 2024 Challenges Results

Figure 3 for SoccerNet 2024 Challenges Results

Figure 4 for SoccerNet 2024 Challenges Results

Abstract:The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.

* 7 pages, 1 figure

Via

Access Paper or Ask Questions

Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion

May 02, 2024

Shanshan Zhang, Mingqian Ji, Yang Li, Jian Yang

Abstract:Pedestrian detection has significantly progressed in recent years, thanks to the development of DNNs. However, detection performance at occluded scenes is still far from satisfactory, as occlusion increases the intra-class variance of pedestrians, hindering the model from finding an accurate classification boundary between pedestrians and background clutters. From the perspective of reducing intra-class variance, we propose to complete features for occluded regions so as to align the features of pedestrians across different occlusion patterns. An important premise for feature completion is to locate occluded regions. From our analysis, channel features of different pedestrian proposals only show high correlation values at visible parts and thus feature correlations can be used to model occlusion patterns. In order to narrow down the gap between completed features and real fully visible ones, we propose an adversarial learning method, which completes occluded features with a generator such that they can hardly be distinguished by the discriminator from real fully visible features. We report experimental results on the CityPersons, Caltech and CrowdHuman datasets. On CityPersons, we show significant improvements over five different baseline detectors, especially on the heavy occlusion subset. Furthermore, we show that our proposed method FeatComp++ achieves state-of-the-art results on all the above three datasets without relying on extra cues.

Via

Access Paper or Ask Questions

Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Mar 15, 2024

Shanshan Zhang, Wen Chen, Qingqing Wu, Ziwei Liu, Shunqing Zhang, Jun Li

Figure 1 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 2 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 3 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Figure 4 for Fairness Optimization for Intelligent Reflecting Surface Aided Uplink Rate-Splitting Multiple Access

Abstract:This paper studies the fair transmission design for an intelligent reflecting surface (IRS) aided rate-splitting multiple access (RSMA). IRS is used to establish a good signal propagation environment and enhance the RSMA transmission performance. The fair rate adaption problem is constructed as a max-min optimization problem. To solve the optimization problem, we adopt an alternative optimization (AO) algorithm to optimize the power allocation, beamforming, and decoding order, respectively. A generalized power iteration (GPI) method is proposed to optimize the receive beamforming, which can improve the minimum rate of devices and reduce the optimization complexity. At the base station (BS), a successive group decoding (SGD) algorithm is proposed to tackle the uplink signal estimation, which trades off the fairness and complexity of decoding. At the same time, we also consider robust communication with imperfect channel state information at the transmitter (CSIT), which studies robust optimization by using lower bound expressions on the expected data rates. Extensive numerical results show that the proposed optimization algorithm can significantly improve the performance of fairness. It also provides reliable results for uplink communication with imperfect CSIT.

* This work has been submitted to TCOM

Via

Access Paper or Ask Questions

Rate-Splitting Multiple Access for Transmissive Reconfigurable Intelligent Surface Transceiver Empowered ISAC System

Feb 19, 2024

Ziwei Liu, Wen Chen, Qingqing Wu, Jinhong Yuan, Shanshan Zhang, Zhendong Li, Jun Li

Abstract:In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of service (QoS) criteria based on this structure and construct an optimization problem with the sensing QoS criteria as the objective function to optimize the sensing stream precoding matrix and the communication stream precoding matrix. Due to the coupling of optimization variables, the formulated problem is a non-convex optimization problem that cannot be solved directly. To tackle the above-mentioned challenging problem, alternating optimization (AO) is utilized to decouple the optimization variables. Specifically, the problem is decoupled into three subproblems about the sensing stream precoding matrix, the communication stream precoding matrix, and the auxiliary variables, which is solved alternatively through AO until the convergence is reached. For solving the problem, successive convex approximation (SCA) is applied to deal with the sum-rate threshold constraints on communications, and difference-of-convex (DC) programming is utilized to solve rank-one non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in terms of improving the communication and sensing QoS.

Via

Access Paper or Ask Questions

Divide and Conquer: Hybrid Pre-training for Person Search

Dec 13, 2023

Yanling Tian, Di Chen, Yunan Liu, Jian Yang, Shanshan Zhang

Abstract:Large-scale pre-training has proven to be an effective method for improving performance across different tasks. Current person search methods use ImageNet pre-trained models for feature extraction, yet it is not an optimal solution due to the gap between the pre-training task and person search task (as a downstream task). Therefore, in this paper, we focus on pre-training for person search, which involves detecting and re-identifying individuals simultaneously. Although labeled data for person search is scarce, datasets for two sub-tasks person detection and re-identification are relatively abundant. To this end, we propose a hybrid pre-training framework specifically designed for person search using sub-task data only. It consists of a hybrid learning paradigm that handles data with different kinds of supervisions, and an intra-task alignment module that alleviates domain discrepancy under limited resources. To the best of our knowledge, this is the first work that investigates how to support full-task pre-training using sub-task data. Extensive experiments demonstrate that our pre-trained model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone. For example, our model improves ResNet50 based NAE by 10.3% relative improvement w.r.t. mAP. Our code and pre-trained models are released for plug-and-play usage to the person search community.

* accepted by AAAI24

Via

Access Paper or Ask Questions

Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface

Sep 06, 2023

Shanshan Zhang, Wen Chen

Figure 1 for Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface

Figure 2 for Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface

Abstract:In this paper, we propose a rate-splitting multiple access (RSMA) scheme for uplink wireless communication systems with intelligent reflecting surface (IRS) aided. In the considered model, IRS is adopted to overcome power attenuation caused by path loss. We construct a max-min fairness optimization problem to obtain the resource allocation, including the receive beamforming at the base station (BS) and phase-shift beamforming at IRS. We also introduce a successive group decoding (SGD) algorithm at the receiver, which trades off the fairness and complexity of decoding. In the simulation, the results show that the proposed scheme has superiority in improving the fairness of uplink communication.

* This paper has been accepted by Globecom 2023

Via

Access Paper or Ask Questions