Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

S. Hu

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features

May 26, 2025

X. Feng, D. Zhang, S. Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang

Abstract:Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (\eg, depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process. More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling. Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap. Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https://github.com/XiaokunFeng/CSTrack.

* Accepted by ICML25!

Via

Access Paper or Ask Questions

Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Dec 27, 2024

X. Feng, D. Zhang, S. Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang

Figure 1 for Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Figure 2 for Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Figure 3 for Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Figure 4 for Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Abstract:Vision-Language Tracking (VLT) aims to localize a target in video sequences using a visual template and language description. While textual cues enhance tracking potential, current datasets typically contain much more image data than text, limiting the ability of VLT methods to align the two modalities effectively. To address this imbalance, we propose a novel plug-and-play method named CTVLT that leverages the strong text-image alignment capabilities of foundation grounding models. CTVLT converts textual cues into interpretable visual heatmaps, which are easier for trackers to process. Specifically, we design a textual cue mapping module that transforms textual cues into target distribution heatmaps, visually representing the location described by the text. Additionally, the heatmap guidance module fuses these heatmaps with the search image to guide tracking more effectively. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our approach, achieving state-of-the-art performance and validating the utility of our method for enhanced VLT.

* Accepted by ICASSP '25 ! Code: https://github.com/XiaokunFeng/CTVLT

Via

Access Paper or Ask Questions

Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Dec 02, 2020

S. Hu, X. Chen, W. Ni, E. Hossain, X. Wang

Figure 1 for Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Figure 2 for Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Figure 3 for Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Figure 4 for Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Abstract:Distributed machine learning (DML) techniques, such as federated learning, partitioned learning, and distributed reinforcement learning, have been increasingly applied to wireless communications. This is due to improved capabilities of terminal devices, explosively growing data volume, congestion in the radio interfaces, and increasing concern of data privacy. The unique features of wireless systems, such as large scale, geographically dispersed deployment, user mobility, and massive amount of data, give rise to new challenges in the design of DML techniques. There is a clear gap in the existing literature in that the DML techniques are yet to be systematically reviewed for their applicability to wireless systems. This survey bridges the gap by providing a contemporary and comprehensive survey of DML techniques with a focus on wireless networks. Specifically, we review the latest applications of DML in power control, spectrum management, user association, and edge cloud computing. The optimality, scalability, convergence rate, computation cost, and communication overhead of DML are analyzed. We also discuss the potential adversarial attacks faced by DML applications, and describe state-of-the-art countermeasures to preserve privacy and security. Last but not least, we point out a number of key issues yet to be addressed, and collate potentially interesting and challenging topics for future research.

Via

Access Paper or Ask Questions