Abstract:Fingerprinting-based localization often suffers from poor cross-environment generalization, especially when only a few labeled samples are available in the target environment. Existing methods mitigate distribution shifts through domain adaptation or improved signal representations, but they usually ignore environmental geometry or use it in a deterministic manner, limiting their ability to capture diverse multipath variations in complex propagation conditions. To address this issue, we propose EnvCoLoc, an environment-conditioned diffusion meta-learning framework for few-shot fingerprinting localization. EnvCoLoc extracts structured descriptors from 3D point clouds and uses them to condition a latent diffusion generator, which produces environment-specific parameter offsets to modulate a shared meta-learned initialization. This design injects geometry-aware priors into the adaptation process and provides more informative initializations for new environments. To learn the stochastic mapping from coarse environmental descriptors to high-dimensional parameter corrections under limited data, the diffusion generator and localization network are jointly optimized within a two-loop meta-learning framework. The generated offsets capture systematic environment-dependent variations, while gradient-based inner-loop adaptation further refines the model to reduce residual task-specific mismatch. We also provide an excess-loss analysis for finite-step adaptation, theoretically supporting the benefit of geometry-aware initialization. Real-world experiments show that EnvCoLoc consistently improves localization accuracy over baseline methods, achieving up to a 20.0% reduction in mean localization error in NLOS scenarios with only 10 support samples.
Abstract:Unified 2D and 3D radio map construction supports network planning, wireless digital twins, and unmanned aerial vehicle (UAV) applications. In urban environments, blockage, reflection, and diffraction make accurate construction expensive for physics-based solvers. Autoregressive next-token prediction offers a single sequential formulation that can cover both 2D and 3D generation, but standard raster ordering ignores the spatial structure of radio propagation. When generation follows propagation, each token is predicted from propagation-relevant history rather than spatially arbitrary context, which provides more causally informative conditioning and lowers conditional uncertainty. We propose PILOT, a pretrained autoregressive framework that replaces raster scan with a wavefront sequence expanding outward from the transmitter. Each prediction step is guided by an environment-aware instruction that spatially aligns environment features with the queried radio map region. The same framework extends to 3D radio maps through height-slice stacking while a gradient loss enforces vertical continuity. On standard 2D benchmarks, PILOT achieves the lowest NMSE among all baselines. For volumetric generation, it reduces NMSE by 78% relative to the diffusion baseline at roughly $2500\times$ faster inference. It also outperforms methods that rely on 10% sparse measurements and achieves the best zero-shot results in the cross-domain evaluation.
Abstract:Unmanned aerial vehicle (UAV) downlink transmission facilitates critical time-sensitive visual applications but is fundamentally constrained by bandwidth scarcity and dynamic channel impairments. The rapid fluctuation of the air-to-ground (A2G) link creates a regime where reliable transmission slots are intermittent and future channel quality can only be predicted with uncertainty. Conventional deep joint source-channel coding (DeepJSCC) methods transmit coupled feature streams, causing global reconstruction failure when specific time slots experience deep fading. Decoupling semantic content into a deterministic structure component and a stochastic texture component enables differentiated error protection strategies aligned with channel reliability. A predictive transmission framework is developed that utilizes a split-stream variational codec and a channel-aware scheduler to prioritize the delivery of structural layout over reliable slots. Experimental evaluations indicate that this approach achieves a 5.6 dB gain in peak signal-to-noise (SNR) ratio over single-stream baselines and maintains structural fidelity under significant prediction mismatch.
Abstract:The deployment of pixel-based antennas and fluid antenna systems (FAS) is hindered by prohibitive channel state information (CSI) acquisition overhead. While radio maps enable proactive mode selection, reconstructing high-fidelity maps from sparse measurements is challenging. Existing physics-agnostic or data-driven methods often fail to recover fine-grained shadowing details under extreme sparsity. We propose a Physics-Regularized Low-Rank Tensor Completion (PR-LRTC) framework for radio map reconstruction. By modeling the signal field as a three-way tensor, we integrate environmental low-rankness with deterministic antenna physics. Specifically, we leverage Effective Aerial Degrees-of-Freedom (EADoF) theory to derive a differential gain topology map as a physical prior for regularization. The resulting optimization problem is solved via an efficient Alternating Direction Method of Multipliers (ADMM)-based algorithm. Simulations show that PR-LRTC achieves a 4 dB gain over baselines at a 10% sampling ratio. It effectively preserves sharp shadowing edges, providing a robust, physics-compliant solution for low-overhead beam management.
Abstract:As wireless networks progress toward sixthgeneration (6G), understanding the spatial distribution of directional beam coverage becomes increasingly important for beam management and link optimization. Multiple-input multipleoutput (MIMO) beam map provides such spatial awareness, yet accurate construction under sparse measurements remains difficult due to incomplete spatial coverage and strong angular variations. This paper presents a tensor decomposition approach for reconstructing MIMO beam map from limited measurements. By transforming measurements from a Cartesian coordinate system into a polar coordinate system, we uncover a matrix-vector outer-product structure associated with different propagation conditions. Specifically, we mathematically demonstrate that the matrix factor, representing beam-space gain, exhibits an intrinsic Toeplitz structure due to the shift-invariant nature of array responses, and the vector factor captures distance-dependent attenuation. Leveraging these structural priors, we formulate a regularized tensor decomposition problem to jointly reconstruct line-of-sight (LOS), reflection, and obstruction propagation conditions. Simulation results confirm that the proposed method significantly enhances data efficiency, achieving a normalized mean square error (NMSE) reduction of over 20% compared to state-of-the-art baselines, even under sparse sampling regimes.
Abstract:Towards human-robot coexistence, socially aware navigation is significant for mobile robots. Yet existing studies on this area focus mainly on path efficiency and pedestrian collision avoidance, which are essential but represent only a fraction of social navigation. Beyond these basics, robots must also comply with user instructions, aligning their actions to task goals and social norms expressed by humans. In this work, we present LISN-Bench, the first simulation-based benchmark for language-instructed social navigation. Built on Rosnav-Arena 3.0, it is the first standardized social navigation benchmark to incorporate instruction following and scene understanding across diverse contexts. To address this task, we further propose Social-Nav-Modulator, a fast-slow hierarchical system where a VLM agent modulates costmaps and controller parameters. Decoupling low-level action generation from the slower VLM loop reduces reliance on high-frequency VLM inference while improving dynamic avoidance and perception adaptability. Our method achieves an average success rate of 91.3%, which is greater than 63% than the most competitive baseline, with most of the improvements observed in challenging tasks such as following a person in a crowd and navigating while strictly avoiding instruction-forbidden regions. The project website is at: https://social-nav.github.io/LISN-project/




Abstract:Radio maps that describe spatial variations in wireless signal strength are widely used to optimize networks and support aerial platforms. Their construction requires location-labeled signal measurements from distributed users, raising fundamental concerns about location privacy. Even when raw data are kept local, the shared model updates can reveal user locations through their spatial structure, while naive noise injection either fails to hide this leakage or degrades model accuracy. This work analyzes how location leakage arises from gradients in a virtual-environment radio map model and proposes a geometry-aligned differential privacy mechanism with heterogeneous noise tailored to both confuse localization and cover gradient spatial patterns. The approach is theoretically supported with a convergence guarantee linking privacy strength to learning accuracy. Numerical experiments show the approach increases attacker localization error from 30 m to over 180 m, with only 0.2 dB increase in radio map construction error compared to a uniform-noise baseline.
Abstract:Non-prehensile (NP) manipulation, in which robots alter object states without forming stable grasps (for example, pushing, poking, or sliding), significantly broadens robotic manipulation capabilities when grasping is infeasible or insufficient. However, enabling a unified framework that generalizes across different tasks, objects, and environments while seamlessly integrating non-prehensile and prehensile (P) actions remains challenging: robots must determine when to invoke NP skills, select the appropriate primitive for each context, and compose P and NP strategies into robust, multi-step plans. We introduce ApaptPNP, a vision-language model (VLM)-empowered task and motion planning framework that systematically selects and combines P and NP skills to accomplish diverse manipulation objectives. Our approach leverages a VLM to interpret visual scene observations and textual task descriptions, generating a high-level plan skeleton that prescribes the sequence and coordination of P and NP actions. A digital-twin based object-centric intermediate layer predicts desired object poses, enabling proactive mental rehearsal of manipulation sequences. Finally, a control module synthesizes low-level robot commands, with continuous execution feedback enabling online task plan refinement and adaptive replanning through the VLM. We evaluate ApaptPNP across representative P&NP hybrid manipulation tasks in both simulation and real-world environments. These results underscore the potential of hybrid P&NP manipulation as a crucial step toward general-purpose, human-level robotic manipulation capabilities. Project Website: https://sites.google.com/view/adaptpnp/home
Abstract:This paper proposes a novel structure-aware matrix completion framework assisted by radial basis function (RBF) interpolation for near-field radio map construction in extremely large multiple-input multiple-output (XL-MIMO) systems. Unlike the far-field scenario, near-field wavefronts exhibit strong dependencies on both angle and distance due to spherical wave propagation, leading to complicated variations in received signal strength (RSS). To effectively capture the intricate spatial variations structure inherent in near-field environments, a regularized RBF interpolation method is developed to enhance radio map reconstruction accuracy. Leveraging theoretical insights from interpolation error analysis of RBF, an inverse μ-law-inspired nonuniform sampling strategy is introduced to allocate measurements adaptively, emphasizing regions with rapid RSS variations near the transmitter. To further exploit the global low-rank structure in the near-field radio map, we integrate RBF interpolation with nuclear norm minimization (NNM)-based matrix completion. A robust Huberized leave-one-out cross-validation (LOOCV) scheme is then proposed for adaptive selection of the tolerance parameter, facilitating optimal fusion between RBF interpolation and matrix completion. The integration of local variation structure modeling via RBF interpolation and global low-rank structure exploitation via matrix completion yields a structure-aware framework that substantially improves the accuracy of near-field radio map reconstruction. Extensive simulations demonstrate that the proposed approach achieves over 10% improvement in normalized mean squared error (NMSE) compared to standard interpolation and matrix completion methods under varying sampling densities and shadowing conditions.




Abstract:Radio maps are essential for enhancing wireless communications and localization. However, existing methods for constructing radio maps typically require costly calibration pro- cesses to collect location-labeled channel state information (CSI) datasets. This paper aims to recover the data collection trajectory directly from the channel propagation sequence, eliminating the need for location calibration. The key idea is to employ a hidden Markov model (HMM)-based framework to conditionally model the channel propagation matrix, while simultaneously modeling the location correlation in the trajectory. The primary challenges involve modeling the complex relationship between channel propagation in multiple-input multiple-output (MIMO) networks and geographical locations, and addressing both line-of-sight (LOS) and non-line-of-sight (NLOS) indoor conditions. In this paper, we propose an HMM-based framework that jointly characterizes the conditional propagation model and the evolution of the user trajectory. Specifically, the channel propagation in MIMO networks is modeled separately in terms of power, delay, and angle, with distinct models for LOS and NLOS conditions. The user trajectory is modeled using a Gaussian-Markov model. The parameters for channel propagation, the mobility model, and LOS/NLOS classification are optimized simultaneously. Experimental validation using simulated MIMO-Orthogonal Frequency-Division Multiplexing (OFDM) networks with a multi-antenna uniform linear arrays (ULA) configuration demonstrates that the proposed method achieves an average localization accuracy of 0.65 meters in an indoor environment, covering both LOS and NLOS regions. Moreover, the constructed radio map enables localization with a reduced error compared to conventional supervised methods, such as k-nearest neighbors (KNN), support vector machine (SVM), and deep neural network (DNN).