Abstract:Tokens are fundamental processing units of generative AI (GenAI) and large language models (LLMs), and token communication (TC) is essential for enabling remote AI-generate content (AIGC) and wireless LLM applications. Unlike traditional bits, each of which is independently treated, the semantics of each token depends on its surrounding context tokens. This inter-token dependency makes TC vulnerable to outage channels, where the loss of a single token can significantly distort the original message semantics. Motivated by this, this paper focuses on optimizing token packetization to maximize the average token similarity (ATS) between the original and received token messages under outage channels. Due to inter-token dependency, this token grouping problem is combinatorial, with complexity growing exponentially with message length. To address this, we propose a novel framework of semantic packet aggregation with lookahead search (SemPA-Look), built on two core ideas. First, it introduces the residual semantic score (RSS) as a token-level surrogate for the message-level ATS, allowing robust semantic preservation even when a certain token packet is lost. Second, instead of full search, SemPA-Look applies a lookahead search-inspired algorithm that samples intra-packet token candidates without replacement (fixed depth), conditioned on inter-packet token candidates sampled with replacement (fixed width), thereby achieving linear complexity. Experiments on a remote AIGC task with the MS-COCO dataset (text captioned images) demonstrate that SemPA-Look achieves high ATS and LPIPS scores comparable to exhaustive search, while reducing computational complexity by up to 40$\times$. Compared to other linear-complexity algorithms such as the genetic algorithm (GA), SemPA-Look achieves 10$\times$ lower complexity, demonstrating its practicality for remote AIGC and other TC applications.
Abstract:To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.
Abstract:Token communication (TC) is poised to play a pivotal role in emerging language-driven applications such as AI-generated content (AIGC) and wireless language models (LLMs). However, token loss caused by channel noise can severely degrade task performance. To address this, in this article, we focus on the problem of semantics-aware packetization and develop a novel algorithm, termed semantic packet aggregation with genetic beam search (SemPA-GBeam), which aims to maximize the average token similarity (ATS) over erasure channels. Inspired from the genetic algorithm (GA) and the beam search algorithm, SemPA-GBeam iteratively optimizes token grouping for packetization within a fixed number of groups (i.e., fixed beam width in beam search) while randomly swapping a fraction of tokens (i.e., mutation in GA). Experiments on the MS-COCO dataset demonstrate that SemPA-GBeam achieves ATS and LPIPS scores comparable to exhaustive search while reducing complexity by more than 20x.
Abstract:Text-based communication is expected to be prevalent in 6G applications such as wireless AI-generated content (AIGC). Motivated by this, this paper addresses the challenges of transmitting text prompts over erasure channels for a text-to-image AIGC task by developing the semantic segmentation and repeated transmission (SMART) algorithm. SMART groups words in text prompts into packets, prioritizing the task-specific significance of semantics within these packets, and optimizes the number of repeated transmissions. Simulation results show that SMART achieves higher similarities in received texts and generated images compared to a character-level packetization baseline, while reducing computing latency by orders of magnitude compared to an exhaustive search baseline.
Abstract:Carrier-sense multiple access with collision avoidance in Wi-Fi often leads to contention and interference, thereby increasing packet losses. These challenges have traditionally been modeled as a graph, with stations (STAs) represented as vertices and contention or interference as edges. Graph coloring assigns orthogonal transmission slots to STAs, managing contention and interference, e.g., using the restricted target wake time (RTWT) mechanism introduced in Wi-Fi 7 standards. However, legacy graph models lack flexibility in optimizing these assignments, often failing to minimize slot usage while maintaining reliable transmissions. To address this issue, we propose ScNeuGM, a neural graph modeling (NGM) framework that flexibly trains a neural network (NN) to construct optimal graph models whose coloring corresponds to optimal slot assignments. ScNeuGM is highly scalable to large Wi-Fi networks with massive STA pairs: 1) it utilizes an evolution strategy (ES) to directly optimize the NN parameters based on one network-wise reward signal, avoiding exhaustive edge-wise feedback estimations in all STA pairs; 2) ScNeuGM also leverages a deep hashing function (DHF) to group contending or interfering STA pairs and restricts NGM NN training and inference to pairs within these groups, significantly reducing complexity. Simulations show that the ES-trained NN in ScNeuGM returns near-optimal graphs 4-10 times more often than algorithms requiring edge-wise feedback and reduces 25\% slots than legacy graph constructions. Furthermore, the DHF in ScNeuGM reduces the training and the inference time of NGM by 4 and 8 times, respectively, and the online slot assignment time by 3 times in large networks, and up to 30\% fewer packet losses in dynamic scenarios due to the timely assignments.
Abstract:Wireless time-sensitive networking (WTSN) is essential for Industrial Internet of Things. We address the problem of minimizing time slots needed for WTSN transmissions while ensuring reliability subject to interference constraints -- an NP-hard task. Existing semidefinite programming (SDP) methods can relax and solve the problem but suffer from high polynomial complexity. We propose a sparse interference graph-aided SDP (SIG-SDP) framework that exploits the interference's sparsity arising from attenuated signals between distant user pairs. First, the framework utilizes the sparsity to establish the upper and lower bounds of the minimum number of slots and uses binary search to locate the minimum within the bounds. Here, for each searched slot number, the framework optimizes a positive semidefinite (PSD) matrix indicating how likely user pairs share the same slot, and the constraint feasibility with the optimized PSD matrix further refines the slot search range. Second, the framework designs a matrix multiplicative weights (MMW) algorithm that accelerates the optimization, achieved by only sparsely adjusting interfering user pairs' elements in the PSD matrix while skipping the non-interfering pairs. We also design an online architecture to deploy the framework to adjust slot assignments based on real-time interference measurements. Simulations show that the SIG-SDP framework converges in near-linear complexity and is highly scalable to large networks. The framework minimizes the number of slots with up to 10 times faster computation and up to 100 times lower packet loss rates than compared methods. The online architecture demonstrates how the algorithm complexity impacts dynamic networks' performance.
Abstract:This paper proposes a novel digital deep joint source-channel coding (DeepJSCC) framework that achieves robust performance across diverse communication environments without requiring extensive retraining and prior knowledge of communication environments. Traditional digital DeepJSCC techniques often face challenges in adapting to various communication environments, as they require significant training overhead and large amounts of communication data to develop either multiple specialized models or a single generalized model, in pre-defined communication environments. To address this challenge, in our framework, an error-adaptive blind training strategy is devised, which eliminates the need for prior knowledge of communication environments. This is achieved by modeling the relationship between the encoder's output and the decoder's input using binary symmetric channels, and optimizing bit-flip probabilities by treating them as trainable parameters. In our framework, a training-aware communication strategy is also presented, which dynamically selects the optimal encoder-decoder pair and transmission parameters based on current channel conditions. In particular, in this strategy, an adaptive power and modulation control method is developed to minimize the total transmission power, while maintaining high task performance. Simulation results demonstrate that our framework outperforms existing DeepJSCC methods, achieving higher peak signal-to-noise ratio, lower power consumption, and requiring significantly fewer encoder-decoder pairs for adaptation.
Abstract:Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g. shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.
Abstract:This study introduces an innovative approach for adaptive power allocation in Non-Orthogonal Multiple Access (NOMA) systems, enhanced by the integration of spaceborne and terrestrial signals through a Reconfigurable Intelligent Surface (RIS). We develop an adaptive mechanism to adjust the power distribution between spaceborne and terrestrial signals according to variations in environmental conditions and elevation angles. This mechanism employs a sophisticated transition model that combines Gaussian Mixture Models with Log-Normal distributions to adaptively counteract the detrimental impacts of atmospheric attenuation and urban shadowing. These adaptive power adjustments significantly enhance system capacity, particularly improving the Signal-to-Interference-plus-Noise Ratio under diverse operational scenarios. Simulation studies confirm the efficacy of our method within an RIS-enhanced framework, showing an approximate 20\% increase in system capacity through optimized power management between spaceborne and terrestrial signals.
Abstract:Low Probability of Detection (LPD) communication aims to obscure the presence of radio frequency (RF) signals to evade surveillance. In the context of mobile surveillance utilizing unmanned aerial vehicles (UAVs), achieving LPD communication presents significant challenges due to the UAVs' rapid and continuous movements, which are characterized by unknown nonlinear dynamics. Therefore, accurately predicting future locations of UAVs is essential for enabling real-time LPD communication. In this paper, we introduce a novel framework termed predictive covert communication, aimed at minimizing detectability in terrestrial ad-hoc networks under multi-UAV surveillance. Our data-driven method synergistically integrates graph neural networks (GNN) with Koopman theory to model the complex interactions within a multi-UAV network and facilitating long-term predictions by linearizing the dynamics, even with limited historical data. Extensive simulation results substantiate that the predicted trajectories using our method result in at least 63%-75% lower probability of detection when compared to well-known state-of-the-art baseline approaches, showing promise in enabling low-latency covert operations in practical scenarios.