Abstract:Large AI models have been widely adopted in wireless communications for channel modeling, beamforming, and resource optimization. However, most existing efforts remain limited to single-modality inputs and channel-specific objec- tives, overlooking the broader potential of large foundation models for unified wireless sensing. To bridge this gap, we propose MMSense, a multi-modal, multi-task foundation model that jointly addresses channel-centric, environment-aware, and human-centered sensing. Our framework integrates image, radar, LiDAR, and textual data by transforming them into vision- compatible representations, enabling effective cross-modal align- ment within a unified feature space. A modality gating mecha- nism adaptively fuses these representations, while a vision-based large language model backbone enables unified feature align- ment and instruction-driven task adaptation. Furthermore, task- specific sequential attention and uncertainty-based loss weighting mechanisms enhance cross-task generalization. Experiments on real wireless scenario datasets show that our approach outper- forms both task-specific and large-model baselines, confirming its strong generalization across heterogeneous sensing tasks.




Abstract:This paper presents a comprehensive real-world and Digital Twin (DT) dataset collected as part of the Find A Rover (AFAR) Challenge, organized by the NSF Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) testbed and hosted at the Lake Wheeler Field in Raleigh, North Carolina. The AFAR Challenge was a competition involving five finalist university teams, focused on promoting innovation in UAV-assisted radio frequency (RF) source localization. Participating teams were tasked with designing UAV flight trajectories and localization algorithms to detect the position of a hidden unmanned ground vehicle (UGV), also referred to as a rover, emitting wireless probe signals generated by GNU Radio. The competition was structured to evaluate solutions in a DT environment first, followed by deployment and testing in AERPAW's outdoor wireless testbed. For each team, the UGV was placed at three different positions, resulting in a total of 30 datasets, 15 collected in a DT simulation environment and 15 in a physical outdoor testbed. Each dataset contains time-synchronized measurements of received signal strength (RSS), received signal quality (RSQ), GPS coordinates, UAV velocity, and UAV orientation (roll, pitch, and yaw). Data is organized into structured folders by team, environment (DT and real-world), and UGV location. The dataset supports research in UAV-assisted RF source localization, air-to-ground (A2G) wireless propagation modeling, trajectory optimization, signal prediction, autonomous navigation, and DT validation. With approximately 300k time-synchronized samples collected from real-world experiments, the dataset provides a substantial foundation for training and evaluating deep learning (DL) models. Overall, the AFAR dataset serves as a valuable resource for advancing robust, real-world solutions in UAV-enabled wireless communications and sensing systems.