Sherman
Abstract:Low-altitude wireless networks (LAWNs) have been envisioned as flexible and transformative platforms for enabling delay-sensitive control applications in Internet of Things (IoT) systems. In this work, we investigate the real-time wireless control over a LAWN system, where an aerial drone is employed to serve multiple mobile automated guided vehicles (AGVs) via finite blocklength (FBL) transmission. Toward this end, we adopt the model predictive control (MPC) to ensure accurate trajectory tracking, while we analyze the communication reliability using the outage probability. Subsequently, we formulate an optimization problem to jointly determine control policy, transmit power allocation, and drone trajectory by accounting for the maximum travel distance and control input constraints. To address the resultant non-convex optimization problem, we first derive the closed-form expression of the outage probability under FBL transmission. Based on this, we reformulate the original problem as a quadratic programming (QP) problem, followed by developing an alternating optimization (AO) framework. Specifically, we employ the projected gradient descent (PGD) method and the successive convex approximation (SCA) technique to achieve computationally efficient sub-optimal solutions. Furthermore, we thoroughly analyze the convergence and computational complexity of the proposed algorithm. Extensive simulations and AirSim-based experiments are conducted to validate the superiority of our proposed approach compared to the baseline schemes in terms of control performance.
Abstract:This letter proposes UniToCom, a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission. Specifically, to enable efficient token representations, we propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information while supporting reliable generation across multiple modalities. By doing this, GenIB-based tokenization is conducive to improving the communication efficiency and reducing computational complexity. Additionally, we develop $\sigma$-GenIB to address the challenges of variance collapse in autoregressive modeling, maintaining representational diversity and stability. Moreover, we employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens under the next-token prediction paradigm. Simulation results validate the effectiveness and superiority of the proposed UniToCom compared to baselines under dynamic channel conditions. By integrating token processing with MLLMs, UniToCom enables scalable and generalizable communication in favor of multimodal understanding and generation, providing a potential solution for next-generation intelligent communications.
Abstract:Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.
Abstract:In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. Different from the conventional aerial communication systems, LAWN's distinctive feature is its tight integration of functional planes in which multiple functionalities continually reshape themselves to operate safely and efficiently in the low-altitude sky. With the LAWN, we discuss several enabling technologies, such as integrated sensing and communication (ISAC), semantic communication, and fully-actuated control systems. Finally, we identify potential applications and key cross-layer challenges. This article offers a comprehensive roadmap for future research and development in the low-altitude airspace.
Abstract:In this paper, a novel semantic communication framework empowered by generative artificial intelligence (GAI) is proposed, specifically leveraging the capabilities of diffusion models (DMs). A rigorous theoretical foundation is established based on stochastic differential equations (SDEs), which elucidates the denoising properties of DMs in mitigating additive white Gaussian noise (AWGN) in latent semantic representations. Crucially, a closed-form analytical relationship between the signal-to-noise ratio (SNR) and the denoising timestep is derived, enabling the optimal selection of diffusion parameters for any given channel condition. To address the distribution mismatch between the received signal and the DM's training data, a mathematically principled scaling mechanism is introduced, ensuring robust performance across a wide range of SNRs without requiring model fine-tuning. Built upon this theoretical insight, we develop a latent diffusion model (LDM)-based semantic transceiver, wherein a variational autoencoder (VAE) is employed for efficient semantic compression, and a pretrained DM serves as a universal denoiser. Notably, the proposed architecture is fully training-free at inference time, offering high modularity and compatibility with large-scale pretrained LDMs. This design inherently supports zero-shot generalization and mitigates the challenges posed by out-of-distribution inputs. Extensive experimental evaluations demonstrate that the proposed framework significantly outperforms conventional neural-network-based semantic communication baselines, particularly under low SNR conditions and distributional shifts, thereby establishing a promising direction for GAI-driven robust semantic transmission in future 6G systems.
Abstract:Recently, a novel flexible-antenna technology, called pinching antennas, has attracted growing academic interest. By inserting discrete dielectric materials, pinching antennas can be activated at arbitrary points along waveguides, allowing for flexible customization of large-scale path loss. This paper investigates a multi-waveguide pinching-antenna integrated sensing and communications (ISAC) system, where transmit pinching antennas (TPAs) and receive pinching antennas (RPAs) coordinate to simultaneously detect one potential target and serve one downlink user. We formulate a communication rate maximization problem subject to radar signal-to-noise ratio (SNR) requirement, transmit power budget, and the allowable movement region of the TPAs, by jointly optimizing TPA locations and transmit beamforming design. To address the non-convexity of the problem, we propose a novel fine-tuning approximation method to reformulate it into a tractable form, followed by a successive convex approximation (SCA)-based algorithm to obtain the solution efficiently. Extensive simulations validate both the system design and the proposed algorithm. Results show that the proposed method achieves near-optimal performance compared with the computational-intensive exhaustive search-based benchmark, and pinching-antenna ISAC systems exhibit a distinct communication-sensing trade-off compared with conventional systems.
Abstract:In next-generation wireless networks, supporting real-time applications such as augmented reality, autonomous driving, and immersive Metaverse services demands stringent constraints on bandwidth, latency, and reliability. Existing semantic communication (SemCom) approaches typically rely on static models, overlooking dynamic conditions and contextual cues vital for efficient transmission. To address these challenges, we propose CaSemCom, a context-aware SemCom framework that leverages a Large Language Model (LLM)-based gating mechanism and a Mixture of Experts (MoE) architecture to adaptively select and encode only high-impact semantic features across multiple data modalities. Our multimodal, multi-user case study demonstrates that CaSemCom significantly improves reconstructed image fidelity while reducing bandwidth usage, outperforming single-agent deep reinforcement learning (DRL) methods and traditional baselines in convergence speed, semantic accuracy, and retransmission overhead.
Abstract:This paper focuses on Zero-Trust Foundation Models (ZTFMs), a novel paradigm that embeds zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems. By integrating core tenets, such as continuous verification, least privilege access (LPA), data confidentiality, and behavioral analytics into the design, training, and deployment of FMs, ZTFMs can enable secure, privacy-preserving AI across distributed, heterogeneous, and potentially adversarial IoT environments. We present the first structured synthesis of ZTFMs, identifying their potential to transform conventional trust-based IoT architectures into resilient, self-defending ecosystems. Moreover, we propose a comprehensive technical framework, incorporating federated learning (FL), blockchain-based identity management, micro-segmentation, and trusted execution environments (TEEs) to support decentralized, verifiable intelligence at the network edge. In addition, we investigate emerging security threats unique to ZTFM-enabled systems and evaluate countermeasures, such as anomaly detection, adversarial training, and secure aggregation. Through this analysis, we highlight key open research challenges in terms of scalability, secure orchestration, interpretable threat attribution, and dynamic trust calibration. This survey lays a foundational roadmap for secure, intelligent, and trustworthy IoT infrastructures powered by FMs.
Abstract:Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages. Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation by leveraging massive labeled samples for downstream task training. In this paper, we propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance, particularly in scenarios with limited access to labeled samples. Specifically, we develop a task-relevant semantic encoder using unlabeled samples, which can be collected by devices in real-world edge networks. To facilitate task-relevant semantic extraction, we introduce self-supervision for learning contrastive features and formulate the information bottleneck (IB) problem to balance the tradeoff between the informativeness of the extracted features and task inference performance. Given the computational challenges of the IB problem, we devise a practical and effective solution by employing self-supervised classification and reconstruction pretext tasks. We further propose efficient joint training methods to enhance end-to-end inference accuracy over wireless channels, even with few labeled samples. We evaluate the proposed framework on image classification tasks over multipath wireless channels. Extensive simulation results demonstrate that SLSCom significantly outperforms conventional digital coding methods and existing DL-based approaches across varying labeled data set sizes and SNR conditions, even when the unlabeled samples are irrelevant to the downstream tasks.
Abstract:This paper introduces a two-stage generative AI (GenAI) framework tailored for temporal spectrum cartography in low-altitude economy networks (LAENets). LAENets, characterized by diverse aerial devices such as UAVs, rely heavily on wireless communication technologies while facing challenges, including spectrum congestion and dynamic environmental interference. Traditional spectrum cartography methods have limitations in handling the temporal and spatial complexities inherent to these networks. Addressing these challenges, the proposed framework first employs a Reconstructive Masked Autoencoder (RecMAE) capable of accurately reconstructing spectrum maps from sparse and temporally varying sensor data using a novel dual-mask mechanism. This approach significantly enhances the precision of reconstructed radio frequency (RF) power maps. In the second stage, the Multi-agent Diffusion Policy (MADP) method integrates diffusion-based reinforcement learning to optimize the trajectories of dynamic UAV sensors. By leveraging temporal-attention encoding, this method effectively manages spatial exploration and exploitation to minimize cumulative reconstruction errors. Extensive numerical experiments validate that this integrated GenAI framework outperforms traditional interpolation methods and deep learning baselines by achieving 57.35% and 88.68% reconstruction error reduction, respectively. The proposed trajectory planner substantially improves spectrum map accuracy, reconstruction stability, and sensor deployment efficiency in dynamically evolving low-altitude environments.