Abstract:AI-communication integration is widely regarded as a core enabling technology for 6G. Most existing AI-based physical-layer designs rely on task-specific models that are separately tailored to individual modules, resulting in poor generalization. In contrast, communication systems are inherently general-purpose and should support broad applicability and robustness across diverse scenarios. Foundation models offer a promising solution through strong reasoning and generalization, yet wireless-system constraints hinder a direct transfer of large language model (LLM)-style success to the wireless domain. Therefore, we introduce the concept of large wireless foundation models (LWFMs) and present a novel framework for empowering the physical layer with foundation models under wireless constraints. Specifically, we propose two paradigms for realizing LWFMs, including leveraging existing general-purpose foundation models and building novel wireless foundation models. Based on recent progress, we distill two roadmaps for each paradigm and formulate design principles under wireless constraints. We further provide case studies of LWFM-empowered wireless systems to intuitively validate their advantages. Finally, we characterize the notion of "large" in LWFMs through a multidimensional analysis of existing work and outline promising directions for future research.
Abstract:The growing adoption of sensor-rich intelligent systems has boosted the use of multi-modal sensing to improve wireless communications. However, traditional methods require extensive manual design of data preprocessing, network architecture, and task-specific fine-tuning, which limits both development scalability and real-world deployment. To address this, we propose WiFo-M$^2$, a foundation model that can be easily plugged into existing deep learning-based transceivers for universal performance gains. To extract generalizable out-of-band (OOB) channel features from multi-modal sensing, we introduce ContraSoM, a contrastive pre-training strategy. Once pre-trained, WiFo-M$^2$ infers future OOB channel features from historical sensor data and strengthens feature robustness via modality-specific data augmentation. Experiments show that WiFo-M$^2$ improves performance across multiple transceiver designs and demonstrates strong generalization to unseen scenarios.
Abstract:Accurate precoding in massive multiple-input multiple-output (MIMO) frequency-division duplexing (FDD) systems relies on efficient channel state information (CSI) acquisition. End-to-end learning frameworks improve performance by jointly optimizing this process, but they lack scalability and fail to generalize across different system configurations, such as varying numbers of antennas and users. To overcome this limitation, we introduce WiFo-E, a wireless foundation model designed for scalable end-to-end precoding. WiFo-E employs multi-task pretraining on a diverse set of configurations to learn transferable representations of underlying wireless principles. Central to the model is a sparse Mixture-of-Experts (MoE) Transformer architecture, which mitigates task interference and enhances training efficiency by activating specialized parameter subsets adaptively. Extensive simulations demonstrate that WiFo-E outperforms conventional per-configuration training and shows strong generalization to unseen system configurations, providing a flexible and efficient foundation for adaptive massive MIMO precoding.
Abstract:Travel planning is a natural real-world task to test large language models (LLMs) planning and tool-use abilities. Although prior work has studied LLM performance on travel planning, existing settings still differ from real-world needs, mainly due to limited domain coverage, insufficient modeling of users' implicit preferences in multi-turn conversations, and a lack of clear evaluation of agents' capability boundaries. To mitigate these gaps, we propose \textbf{TravelBench}, a benchmark for fully real-world travel planning. We collect user queries, user profile and tools from real scenarios, and construct three subtasks-Single-Turn, Multi-Turn, and Unsolvable-to evaluate agent's three core capabilities in real settings: (1) solving problems autonomously, (2) interacting with users over multiple turns to refine requirements, and (3) recognizing the limits of own abilities. To enable stable tool invocation and reproducible evaluation, we cache real tool-call results and build a sandbox environment that integrates ten travel-related tools. Agents can combine these tools to solve most practical travel planning problems, and our systematic verification demonstrates the stability of the proposed benchmark. We further evaluate multiple LLMs on TravelBench and conduct an in-depth analysis of their behaviors and performance. TravelBench provides a practical and reproducible evaluation benchmark to advance research on LLM agents for travel planning.\footnote{Our code and data will be available after internal review.
Abstract:Multi-user signal demodulation is critical to wireless communications, directly impacting transmission reliability and efficiency. However, existing demodulators underperform in generic multi-user environments: classical demodulators struggle to balance accuracy and complexity, while deep learning-based methods lack adaptability under heterogeneous configurations. Although diffusion models have been introduced for demodulation, their flexibility remains limited for practical use. To address these issues, this work proposes WiFo-MUD, a universal diffusion-based foundation model for multi-user demodulation. The model aligns inter-user signal-to-noise ratio imbalance and performs conditional denoising via a customized backbone. Furthermore, a communication-aware consistency distillation method and a dynamic user-grouping strategy are devised to enhance inference. WiFo-MUD achieves state-of-the-art results on large-scale heterogeneous datasets, demonstrating efficient inference and strong generalization across varying system configurations.
Abstract:We present STAgent, an agentic large language model tailored for spatio-temporal understanding, designed to solve complex tasks such as constrained point-of-interest discovery and itinerary planning. STAgent is a specialized model capable of interacting with ten distinct tools within spatio-temporal scenarios, enabling it to explore, verify, and refine intermediate steps during complex reasoning. Notably, STAgent effectively preserves its general capabilities. We empower STAgent with these capabilities through three key contributions: (1) a stable tool environment that supports over ten domain-specific tools, enabling asynchronous rollout and training; (2) a hierarchical data curation framework that identifies high-quality data like a needle in a haystack, curating high-quality queries with a filter ratio of 1:10,000, emphasizing both diversity and difficulty; and (3) a cascaded training recipe that starts with a seed SFT stage acting as a guardian to measure query difficulty, followed by a second SFT stage fine-tuned on queries with high certainty, and an ultimate RL stage that leverages data of low certainty. Initialized with Qwen3-30B-A3B to establish a strong SFT foundation and leverage insights into sample difficulty, STAgent yields promising performance on TravelBench while maintaining its general capabilities across a wide range of general benchmarks, thereby demonstrating the effectiveness of our proposed agentic model.
Abstract:Large language model (LLM) agents have demonstrated strong capabilities in planning and tool use. Travel planning provides a natural and high-impact testbed for these capabilities, as it requires multi-step reasoning, iterative preference elicitation through interaction, and calls to external tools under evolving constraints. Prior work has studied LLMs on travel-planning tasks, but existing settings are limited in domain coverage and multi-turn interaction. As a result, they cannot support dynamic user-agent interaction and therefore fail to comprehensively assess agent capabilities. In this paper, we introduce TravelBench, a real-world travel-planning benchmark featuring multi-turn interaction and tool use. We collect user requests from real-world scenarios and construct three subsets-multi-turn, single-turn, and unsolvable-to evaluate different aspects of agent performance. For stable and reproducible evaluation, we build a controlled sandbox environment with 10 travel-domain tools, providing deterministic tool outputs for reliable reasoning. We evaluate multiple LLMs on TravelBench and conduct an analysis of their behaviors and performance. TravelBench offers a practical and reproducible benchmark for advancing LLM agents in travel planning.




Abstract:There has been significant recent interest in understanding the capacity of Transformers for in-context learning (ICL), yet most theory focuses on supervised settings with explicitly labeled pairs. In practice, Transformers often perform well even when labels are sparse or absent, suggesting crucial structure within unlabeled contextual demonstrations. We introduce and study in-context semi-supervised learning (IC-SSL), where a small set of labeled examples is accompanied by many unlabeled points, and show that Transformers can leverage the unlabeled context to learn a robust, context-dependent representation. This representation enables accurate predictions and markedly improves performance in low-label regimes, offering foundational insights into how Transformers exploit unlabeled context for representation learning within the ICL framework.




Abstract:With the explosive growth of connected devices and emerging applications, current wireless networks are encountering unprecedented demands for massive user access, where the inter-user interference has become a critical challenge to maintaining high quality of service (QoS) in multi-user communication systems. To tackle this issue, we propose a bandwidth-efficient semantic communication paradigm termed Non-Orthogonal Codewords for Semantic Communication (NOC4SC), which enables simultaneous same-frequency transmission without spectrum spreading. By leveraging the Swin Transformer, the proposed NOC4SC framework enables each user to independently extract semantic features through a unified encoder-decoder architecture with shared network parameters across all users, which ensures that the user's data remains protected from unauthorized decoding. Furthermore, we introduce an adaptive NOC and SNR Modulation (NSM) block, which employs deep learning to dynamically regulate SNR and generate approximately orthogonal semantic features within distinct feature subspaces, thereby effectively mitigating inter-user interference. Extensive experiments demonstrate the proposed NOC4SC achieves comparable performance to the DeepJSCC-PNOMA and outperforms other multi-user SemCom baseline methods.
Abstract:With the evolution of 6G networks, modern communication systems are facing unprecedented demands for high reliability and low latency. However, conventional transport protocols are designed for bit-level reliability, failing to meet the semantic robustness requirements. To address this limitation, this paper proposes a novel Semantic Information Transport Protocol (SITP), which achieves TCP-level reliability and UDP level latency by verifying only packet headers while retaining potentially corrupted payloads for semantic decoding. Building upon SITP, a cross-layer analytical model is established to quantify packet-loss probability across the physical, data-link, network, transport, and application layers. The model provides a unified probabilistic formulation linking signal noise rate (SNR) and packet-loss rate, offering theoretical foundation into end-to-end semantic transmission. Furthermore, a cross-image feature interleaving mechanism is developed to mitigate consecutive burst losses by redistributing semantic features across multiple correlated images, thereby enhancing robustness in burst-fade channels. Extensive experiments show that SITP offers lower latency than TCP with comparable reliability at low SNRs, while matching UDP-level latency and delivering superior reconstruction quality. In addition, the proposed cross-image semantic interleaving mechanism further demonstrates its effectiveness in mitigating degradation caused by bursty packet losses.