Abstract:Current limitations in wireless modeling and radio frequency (RF)-based AI are primarily driven by a lack of high-quality, measurement-based datasets that connect RF signals to their physical environments. RF heatmaps, the typical form of such data, are high-dimensional and complex but lack the geometric and semantic context needed for interpretation, constraining the development of supervised machine learning models. To address this bottleneck, we propose a new class of multimodal datasets that combines RF measurements with auxiliary modalities like high-resolution cameras and lidar to bridge the gap between RF signals and their physical causes. The proposed data collection will span diverse indoor and outdoor environments, featuring both static and dynamic scenarios, including human activities ranging from walking to subtle gestures. By achieving precise spatial and temporal co-registration and creating digital replicas for voxel-level annotation, this dataset will enable transformative AI research. Key tasks include the forward problem of predicting RF heatmaps from visual data to revolutionize wireless system design, and the inverse problem of inferring scene semantics from RF signals, creating a new form of RF-based perception.
Abstract:Accurately modeling millimeter-wave (mmWave) propagation is essential for real-time AR and autonomous systems. Differentiable ray tracing offers a physics-grounded solution but still facing deployment challenges due to its over-reliance on exhaustive channel measurements or brittle, hand-tuned scene models for material properties. We present VisRFTwin, a scalable and data-efficient digital-twin framework that integrates vision-derived material priors with differentiable ray tracing. Multi-view images from commodity cameras are processed by a frozen Vision-Language Model to extract dense semantic embeddings, which are translated into initial estimates of permittivity and conductivity for scene surfaces. These priors initialize a Sionna-based differentiable ray tracer, which rapidly calibrates material parameters via gradient descent with only a few dozen sparse channel soundings. Once calibrated, the association between vision features and material parameters is retained, enabling fast transfer to new scenarios without repeated calibration. Evaluations across three real-world scenarios, including office interiors, urban canyons, and dynamic public spaces show that VisRFTwin reduces channel measurement needs by up to 10$\times$ while achieving a 59% lower median delay spread error than pure data-driven deep learning methods.
Abstract:With the development of Integrated Sensing and Communication (ISAC) for Sixth-Generation (6G) wireless systems, contactless human recognition has emerged as one of the key application scenarios. Since human gesture motion induces subtle and random variations in wireless multipath propagation, how to accurately model human gesture channels has become a crucial issue for the design and validation of ISAC systems. To this end, this paper proposes a deep learning-based human gesture channel modeling framework for ISAC scenarios, in which the human body is decomposed into multiple body parts, and the mapping between human gestures and their corresponding multipath characteristics is learned from real-world measurements. Specifically, a Poisson neural network is employed to predict the number of Multi-Path Components (MPCs) for each human body part, while Conditional Variational Auto-Encoders (C-VAEs) are reused to generate the scattering points, which are further used to reconstruct continuous channel impulse responses and micro-Doppler signatures. Simulation results demonstrate that the proposed method achieves high accuracy and generalization across different gestures and subjects, providing an interpretable approach for data augmentation and the evaluation of gesture-based ISAC systems.