INSA Rennes, IETR
Abstract:Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.
Abstract:Multi-modal sensing is an important enabler for future environment-aware wireless systems, since a single sensing modality is generally insufficient to provide accurate metric geometry, material awareness, and semantic interpretability in complex environments. This paper presents a measurement-based multi-modal THz sensing and vision framework for indoor environment reconstruction. A three-dimensional monostatic THz channel sounding system operating at 290-310 GHz is integrated with an omnidirectional fisheye camera to acquire radio-frequency and visual observations from a common sensing viewpoint. From the measured THz data, a signal processing pipeline extracts multipath components and infers geometryand material-consistent structural primitives through trajectory tracking-assisted parameter estimation, graph-based structure discovery, planar reconstruction, and reflection-loss analysis. In parallel, AI-based visual perception modules extract object-level semantic masks and depth priors from panoramic images. To associate these heterogeneous representations, an agentic-AI-based task-driven THz-agent module is developed to select appropriate integration tools according to the attributes of the modality-specific outputs. Through angular alignment and consistency analysis, THz-derived metric geometry and material information are associated with vision-derived semantic regions and depth priors, enabling geometry-consistent and semantically interpretable environment reconstruction directly from measurements. Experimental validation in the indoor L-shaped hallway demonstrates that the proposed framework reconstructs dominant structural elements with centimeter-level accuracy while identifying semantic categories and material attributes of representative indoor objects.
Abstract:The transition to near-field (NF) communications in ultra-massive multiple-input multiple-output (UM-MIMO) systems fundamentally alters the spatial degrees of freedom (DoF) of wireless channels. While the NF DoF of line-of-sight (LoS) transmission channels is well-characterized in the literature, the DoF in NF multipath scenarios remains underexplored. This paper investigates the spatial DoF of NF UM-MIMO channels under practical multipath conditions. A generic DoF metric is derived by modeling multipath propagation and analyzing the resulting eigenvalue distribution based on the Green' s function representation of the channel. The DoF contribution of each path is determined by the product of the effective electrical aperture and the subtended solid angle, and the total DoF is obtained through the effective union of spatially resolvable path contributions. A mapping between the eigenvalue distribution and multipath powers is further established. Numerical simulations and real-world NF channel measurements at 28-30 GHz with 720 array elements are conducted for validation in both LoS multipath and non-LoS scenarios. The results show that multipath propagation can significantly increase the spatial DoF and that the proposed metric accurately predicts the DoF of practical NF channels. The proposed framework provides a practical tool for DoF prediction and supports capacity analysis and spatial multiplexing design in future NF UM-MIMO systems.
Abstract:Wireless agentic systems enable agents to autonomously perceive, reason, and act. However, existing works neglect the tight coupling between sensing and control in closed-loop integrated sensing and communication (ISAC) systems. In this paper, we propose an active inference (AIF)-driven wireless agentic system for closed-loop ISAC, which jointly optimizes control and sensing resource allocation via backward--forward message passing on a factor graph. The AIF agent maintains a generative model as a digital twin by integrating a localization model for uncertainty-aware state inference and a localization channel knowledge map (CKM) for approximating observation quality during planning. Simulation results demonstrate that the AIF-enabled agent adaptively allocates sensing resources based on spatially varying channel conditions, achieving superior balance among tracking accuracy, control effort, and sensing resource consumption over baseline strategies.
Abstract:We consider uplink frugal simultaneous localization and mapping (SLAM) in phase-coherent distributed MIMO (D-MIMO) systems, where a network of spatially separated single-antenna access points (APs) coherently receives narrowband, single-snapshot pilot signals from a single-antenna user equipment (UE). In contrast to existing phase-coherent localization and SLAM methods that rely on wideband measurements and/or multi-antenna APs, the proposed frugal setting operates with the minimum possible localization resources: a single subcarrier and a single snapshot at each single-antenna AP. In this paper, we formulate phase-coherent frugal SLAM as a coherent imaging problem, constructing a spatial image over a region of interest by treating the distributed AP observations as coming from a large synthetic aperture. Based on the coherent image, we develop a detection and localization framework that jointly identifies the UE, reflective surfaces, and scatterers. Simulation results validate the proposed framework and provide insights into the impact of grid resolution and off-grid error on detection and localization performance.
Abstract:In this work, we consider end-to-end calibration of an integrated sensing and communication (ISAC) base station (BS) under gain-phase and antenna displacement impairments without collecting signals from predefined positions (labeled data). We consider a BS with two impaired uniform linear arrays used for simultaneous multi-target sensing and communication with a user equipment (UE) leveraging orthogonal frequency-division multiplexing signals. The main contribution is the design of a framework that can compensate for the impairments without labeled data and considering coherent receive signals. We harness a differentiable precoder based on the maximum array response in an angular direction at the transmitter and the orthogonal matching pursuit (OMP) algorithm at the sensing receiver. We propose an ISAC loss as a combination of sensing and communication losses that provides a trade-off between the two functionalities. We compare two sensing objective alternatives: (i) maximize the maximum response of the angle-delay map of the targets or (ii) minimize the norm of the residual signal at the output of the OMP algorithm after all estimated targets have been removed. The communication objective maximizes the energy of the received signal at the UE. Additionally, our framework leverages an approximation of the channel gradient that avoids the impractical knowledge of the gradient of the channel. Our results show that the proposed method performs closely to using labeled data and knowledge of the channel gradient in terms of sensing position estimation and communication symbol error rate. When comparing the two sensing losses, minimizing the norm of the OMP residual yields significantly better sensing position estimation with slightly increased complexity.
Abstract:Future wireless systems increasingly require predictive and transferable representations that can support multiple physical-layer (PHY) tasks under dynamic environments. However, most existing supervised learning-based methods are designed for a single task, which leads to high adaptation cost. To address this issue, we propose a joint-embedding predictive architecture for multimodal sensing-assisted communications (JEPA-MSAC), a self-supervised multimodal predictive representation learning framework for wireless environments. The proposed framework first maps multimodal sensing and communication measurements into a unified token space, and then pretrains a shared backbone using temporal block-masked JEPA to learn a predictive latent space that captures environment dynamics and cross-modal dependencies. After pretraining, the backbone is frozen and reused as a general future-feature generator, on top of which lightweight task heads are trained for localization, beam prediction, and received signal strength indicator (RSSI) prediction. Extensive experiments show the latent state supports accurate multi-task prediction with low adaptation cost. Additionally, ablation studies reveal its scaling behavior and the impact of key pretraining setups.
Abstract:While Third Generation Partnership Project (3GPP) has confirmed orthogonal frequency division multiplexing (OFDM) as the baseline waveform for sixth-generation (6G), its performance is severely compromised in the high-mobility scenarios envisioned for 6G. Building upon the GEARBOX-PHY vision, we present gear-switching OFDM (GS-OFDM): a unified framework in which the base station (BS) adaptively selects among three gears, ranging from legacy OFDM to delay-Doppler domain processing based on the channel mobility conditions experienced by the user equipments (UEs). We illustrate the benefit of adaptive gear switching for communication throughput and, finally, we conclude with an outlook on research challenges and opportunities.
Abstract:Reconfigurable antennas (RAs) utilize the electromagnetic (EM) domain to provide dynamic control over antenna radiation patterns, which offers an effective way to enhance power efficiency in wireless links. Unlike conventional arrays with fixed element patterns, RAs enable on-demand beam-pattern synthesis by directly controlling each antenna's EM characteristics. While existing research on RAs has primarily focused on improving spectral efficiency, this paper explores their application for downlink localization. Moreover, the majority of existing works focus on far-field scenarios with little attention on near-field (NF). Motivated by these gaps, we consider a synthesis model in which each antenna generates desired beampatterns from a finite set of EM basis functions. We then formulate a joint optimization problem for the baseband (BB) and EM precoders with the objective of minimizing the user equipment (UE) position error bound (PEB) in NF conditions. Our analytical derivations and extensive simulation results demonstrate that the proposed hybrid precoder design for RAs significantly improves UE positioning accuracy compared to traditional non-reconfigurable arrays.
Abstract:Distributed multiple-input multiple-output (MIMO) architectures enable large-scale integrated sensing and communication (ISAC) by providing high spatial resolution and robustness through spatial diversity. However, practical phase-coherent sensing is challenged by phase synchronization errors and modeling mismatch caused by grid discretization. Existing over-the-air (OTA) synchronization methods typically treat synchronization and sensing tasks separately, which may lead to inaccurate phase alignment when multipath components are used for imaging. In this paper, we propose a non-line-of-sight (NLOS)-aided joint OTA synchronization and off-grid imaging framework for distributed MIMO ISAC systems. First, a line-of-sight (LOS)-assisted coarse synchronization is performed to establish initial phase coherence across distributed links. Subsequently, an iterative refinement stage exploits reconstructed NLOS components obtained from imaging results. By modeling off-grid effects via a first-order Taylor expansion, we transform measurements with nonlinear off-grid offset into an augmented linear model with jointly sparse reflectivity and off-set variables. The imaging problem is reformulated as a structured sparse recovery task and solved using a tailored off-grid approximate message passing (OG-AMP) algorithm. The imaging and synchronization modules are coupled within a closed-loop alternative optimization framework, where improved imaging enables more accurate phase refinement, and vice versa. Numerical results show that the proposed framework achieves accurate synchronization and imaging under phase errors. Compared with conventional approaches, it shows superior robustness and accuracy.