Abstract:Diversity and multiplexing are the two fundamental gains of multiple-input and multiple-output (MIMO) communications, enabling systems to simultaneously achieve increased reliability and higher data rates. The intricate interplay between these two metrics is captured by the celebrated diversity-multiplexing tradeoff (DMT). With the rapid evolution of wireless technologies, low-latency integrated sensing and communication (ISAC) has emerged as a key enabler for 6G applications, including extended reality (XR) and massive digital twins. Consequently, understanding the DMT within MIMO ISAC systems becomes critical. In this paper, we investigate the communication DMT in a mono-static MIMO ISAC system under Rayleigh fading, specifically when the transmitter is constrained to emit sensing-optimal waveforms. By unveiling the geometric properties of generalized Stiefel manifolds and employing large-deviation analysis, we characterize the asymptotic outage probability of this typical ISAC channel. This formulation yields an elegant converse bound on the sensing-constrained DMT. Ultimately, our work provides an answer to a pivotal unanswered question in ISAC system design: How much MIMO gain is fundamentally sacrificed in communication to integrate optimal sensing capabilities?




Abstract:Recent years have witnessed remarkable advances in Large Vision-Language Models (LVLMs), which have achieved human-level performance across various complex vision-language tasks. Following LLaVA's paradigm, mainstream LVLMs typically employ a shallow MLP for visual-language alignment through a two-stage training process: pretraining for cross-modal alignment followed by instruction tuning. While this approach has proven effective, the underlying mechanisms of how MLPs bridge the modality gap remain poorly understood. Although some research has explored how LLMs process transformed visual tokens, few studies have investigated the fundamental alignment mechanism. Furthermore, the MLP adapter requires retraining whenever switching LLM backbones. To address these limitations, we first investigate the working principles of MLP adapters and discover that they learn to project visual embeddings into subspaces spanned by corresponding text embeddings progressively. Based on this insight, we propose LangBridge, a novel adapter that explicitly maps visual tokens to linear combinations of LLM vocabulary embeddings. This innovative design enables pretraining-free adapter transfer across different LLMs while maintaining performance. Our experimental results demonstrate that a LangBridge adapter pre-trained on Qwen2-0.5B can be directly applied to larger models such as LLaMA3-8B or Qwen2.5-14B while maintaining competitive performance. Overall, LangBridge enables interpretable vision-language alignment by grounding visual representations in LLM vocab embedding, while its plug-and-play design ensures efficient reuse across multiple LLMs with nearly no performance degradation. See our project page at https://jiaqiliao77.github.io/LangBridge.github.io/




Abstract:Accurate localization and perception are pivotal for enhancing the safety and reliability of vehicles. However, current localization methods suffer from reduced accuracy when the line-of-sight (LOS) path is obstructed, or a combination of reflections and scatterings is present. In this paper, we present an integrated localization and sensing method that delivers superior performance in complex environments while being computationally efficient. Our method uniformly leverages various types of multipath components (MPCs) through the lens of random finite sets (RFSs), encompassing reflections, scatterings, and their combinations. This advancement eliminates the need for the multipath identification step and streamlines the filtering process by removing the necessity for distinct filters for different multipath types, a requirement that was critical in previous research. The simulation results demonstrate the superior performance of our method in both robustness and effectiveness, particularly in complex environments where the LOS MPC is obscured and in situations involving clutter and missed detection of MPC measurements.