Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinglin Li

Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception

Aug 27, 2025

Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Rui Pan, Yujia Yang, Congzhang Shao, Yuewen Liu, Jinglin Li

Abstract:Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard critical fine-grained 3D structural cues essential for accurate object recognition and localization. To this end, we first introduce point-level tokens as intermediate representations for collaborative perception. However, point-cloud data are inherently unordered, massive, and position-sensitive, making it challenging to produce compact and aligned point-level token sequences that preserve detailed structural information. Therefore, we present CoPLOT, a novel Collaborative perception framework that utilizes Point-Level Optimized Tokens. It incorporates a point-native processing pipeline, including token reordering, sequence modeling, and multi-agent spatial alignment. A semantic-aware token reordering module generates adaptive 1D reorderings by leveraging scene-level and token-level semantic information. A frequency-enhanced state space model captures long-range sequence dependencies across both spatial and spectral domains, improving the differentiation between foreground tokens and background clutter. Lastly, a neighbor-to-ego alignment module applies a closed-loop process, combining global agent-level correction with local token-level refinement to mitigate localization noise. Extensive experiments on both simulated and real-world datasets show that CoPLOT outperforms state-of-the-art models, with even lower communication and computation overhead. Code will be available at https://github.com/CheeryLeeyy/CoPLOT.

Via

Access Paper or Ask Questions

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Nov 25, 2024

Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li

Abstract:Collaborative perception in autonomous driving significantly enhances the perception capabilities of individual agents. Immutable heterogeneity in collaborative perception, where agents have different and fixed perception networks, presents a major challenge due to the semantic gap in their exchanged intermediate features without modifying the perception networks. Most existing methods bridge the semantic gap through interpreters. However, they either require training a new interpreter for each new agent type, limiting extensibility, or rely on a two-stage interpretation via an intermediate standardized semantic space, causing cumulative semantic loss. To achieve both extensibility in immutable heterogeneous scenarios and low-loss feature interpretation, we propose PolyInter, a polymorphic feature interpreter. It contains an extension point through which emerging new agents can seamlessly integrate by overriding only their specific prompts, which are learnable parameters intended to guide the interpretation, while reusing PolyInter's remaining parameters. By leveraging polymorphism, our design ensures that a single interpreter is sufficient to accommodate diverse agents and interpret their features into the ego agent's semantic space. Experiments conducted on the OPV2V dataset demonstrate that PolyInter improves collaborative perception precision by up to 11.1% compared to SOTA interpreters, while comparable results can be achieved by training only 1.4% of PolyInter's parameters when adapting to new agents.

Via

Access Paper or Ask Questions

CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

Sep 12, 2024

Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li

Figure 1 for CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

Figure 2 for CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

Figure 3 for CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

Figure 4 for CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

Abstract:By sharing complementary perceptual information, multi-agent collaborative perception fosters a deeper understanding of the environment. Recent studies on collaborative perception mostly utilize CNNs or Transformers to learn feature representation and fusion in the spatial dimension, which struggle to handle long-range spatial-temporal features under limited computing and communication resources. Holistically modeling the dependencies over extensive spatial areas and extended temporal frames is crucial to enhancing feature quality. To this end, we propose a resource efficient cross-agent spatial-temporal collaborative state space model (SSM), named CollaMamba. Initially, we construct a foundational backbone network based on spatial SSM. This backbone adeptly captures positional causal dependencies from both single-agent and cross-agent views, yielding compact and comprehensive intermediate features while maintaining linear complexity. Furthermore, we devise a history-aware feature boosting module based on temporal SSM, extracting contextual cues from extended historical frames to refine vague features while preserving low overhead. Extensive experiments across several datasets demonstrate that CollaMamba outperforms state-of-the-art methods, achieving higher model accuracy while reducing computational and communication overhead by up to 71.9% and 1/64, respectively. This work pioneers the exploration of the Mamba's potential in collaborative perception. The source code will be made available.

* Submitted to AAAI 2025

Via

Access Paper or Ask Questions

Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Jun 30, 2023

Ping Zhang, Heng Yang, Zhiyong Feng, Yanpeng Cui, Jincheng Dai, Xiaoqi Qin, Jinglin Li, Qixun Zhang

Figure 1 for Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Figure 2 for Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Figure 3 for Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Figure 4 for Toward Intelligent and Efficient 6G Networks: JCSC Enabled On-Purpose Machine Communications

Abstract:Driven by the vision of "intelligent connection of everything" toward 6G, the collective intelligence of networked machines can be fully exploited to improve system efficiency by shifting the paradigm of wireless communication design from naive maximalist approaches to intelligent value-based approaches. In this article, we propose an on-purpose machine communication framework enabled by joint communication, sensing, and computation (JCSC) technology, which employs machine semantics as the interactive information flow. Naturally, there are potential technical barriers to be solved before the widespread adoption of on-purpose communications, including the conception of machine purpose, fast and concise networking strategy, and semantics-aware information exchange mechanism during the process of task-oriented cooperation. Hence, we discuss enabling technologies complemented by a range of open challenges. The simulation result shows that the proposed framework can significantly reduce networking overhead and improve communication efficiency.

* in IEEE Wireless Communications, vol. 30, no. 1, pp. 150-157, February 2023
* 8 pages, 6 figures

Via

Access Paper or Ask Questions