Abstract:Pinching antenna (PA) systems provide a new spatial degree of freedom by flexible activation of pinching positions. However, the resulting effective channel strongly depends on the activated pinching positions, rendering conventional coherent transmission generally relies on accurate acquisition of instantaneous channel state information (CSI) and incurring substantial pilot overhead. To address this challenge, we propose a differential spatial modulation (DSM) scheme for PA systems, termed as DSM-PA. Specifically, a differential transmission scheme with an embedded Alamouti coding structure is designed, where information bits are conveyed via phase variations between adjacent symbol blocks. This design enables noncoherent transmission without requiring instantaneous CSI while simultaneously achieving transmit diversity. Moreover, to fully exploit the spatial degrees of freedom of PA systems, a pinching position-based index modulation (IM) rule is developed to enhance spectral efficiency. An asymptotically tight upper bound on the average bit error rate (BER) over quasi-static Rician fading channels is derived using the moment-generating function (MGF) method. The diversity analysis also reveals that the proposed DSM-PA scheme achieves full transmit diversity. Finally, simulation results verify the accuracy of the BER analysis and demonstrate the effectiveness of the proposed DSM-PA scheme.
Abstract:Understanding the wireless spectrum is a fundamen- tal requirement for intelligent communication systems, however, interpreting spectrograms requires extracting multiple physical attributes and reasoning about signal structure, which is a capability that is not achieved by traditional ML approaches. Recent advances in vision-language models (VLMs) demonstrated the possibility of learning such interpretation capabilities directly from data. This paper investigates whether VLMs can learn this capability from synthetic data alone, and more importantly, whether such learned representations generalize to real over-the- air RF environments. To address this question, we introduce RF-Analyzer, an SDR-to-AI analysis platform that integrates live spectrum captures associated with the corresponding VLM- based interpretation, enabling direct evaluation of VLMs outputs on live over-the-air signals. Using this platform, we assess a model trained exclusively on synthetic spectrogram data with general-purpose baselines. To enable systematic analysis, we establish a benchmark framework comprising three metrics, Physical Attribute Extraction Score (PAES), Prompt Leakage Rate (PLR), and hallucination count, to assess signal understanding and grounding. The obtained results demonstrate that VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes such as spectral occupancy, temporal behavior, and SNR. This indicates that synthetic data is sufficient for learning transferable representations of RF signal structure. However, this generalization is limited due to the fact that synthetic training does not provide reliable semantic grounding without contextual priors. In particular, generalization breaks under conditions that are not covered in the synthetic distribution, particularly low-SNR regimes
Abstract:Acquiring the channel state information from limited and noisy observations at pilot positions is critical for wireless multiple-input multiple-output (MIMO)-orthogonal frequency division multiplexing (OFDM) systems. In this paper, we view this process as a conditional generative task in which the partial noisy channel estimates at the pilots are utilized as a ``prompt'' to guide the diffusion ``inpainting'' of the underlying channel. To this end, we resort to a general Conditional Diffusion Transformer (CDiT) framework with a well-designed network architecture and update rule. In particular, we design a dedicated embedding strategy to encode and adapt to different pilot patterns and noise levels, and utilize a special cross-attention mechanism to align the partial raw channel observations with the denoised channel at each time step of the generation process. This architecture effectively anchors the diffusion process, enabling the model to accurately recover full channel details from limited noisy observations. Comprehensive experimental results show that, the proposed approach achieves a performance gain of over 5 dB compared to the baselines under varying noise conditions, and provides robust channel acquisition even under a sparse pilot density of 1/32 without significant performance loss compared to the denser pilot cases. Moreover, it is capable of generating high-quality channel matrices within just 10 inference steps, effectively balancing estimation accuracy with computational efficiency and inference speed. Ablation studies demonstrate the rationality of the model design and the necessity of its modules.
Abstract:Future wireless systems increasingly require predictive and transferable representations that can support multiple physical-layer (PHY) tasks under dynamic environments. However, most existing supervised learning-based methods are designed for a single task, which leads to high adaptation cost. To address this issue, we propose a joint-embedding predictive architecture for multimodal sensing-assisted communications (JEPA-MSAC), a self-supervised multimodal predictive representation learning framework for wireless environments. The proposed framework first maps multimodal sensing and communication measurements into a unified token space, and then pretrains a shared backbone using temporal block-masked JEPA to learn a predictive latent space that captures environment dynamics and cross-modal dependencies. After pretraining, the backbone is frozen and reused as a general future-feature generator, on top of which lightweight task heads are trained for localization, beam prediction, and received signal strength indicator (RSSI) prediction. Extensive experiments show the latent state supports accurate multi-task prediction with low adaptation cost. Additionally, ablation studies reveal its scaling behavior and the impact of key pretraining setups.
Abstract:Current reconfigurable intelligent surface (RIS)-aided near-field (NF) localization methods assume the RIS position is known a priori, and it has limited their practical applicability. This paper applies a hybrid RIS (HRIS) at an unknown position to locate non-line-of-sight (NLOS) NF targets. To this end, we first propose a two-stage gridless localization framework for achieving HRIS self-localization, and then determine the positions of the NF targets. In the first stage, we use the NF Fresnel approximation to convert the signal model into a virtual far-field model through delay-based cross-correlation of centrally symmetric HRIS elements. Such a conversion will naturally extend the aperture of the virtual array. A single-snapshot decoupled atomic norm minimization (DANM) algorithm is then proposed to locate an NF target relative to the HRIS, which includes a two-dimensional (2-D) direction of arrival (DOA) estimation with automatic pairing, the multiple signal classification (MUSIC) method for range estimation, and a total least squares (TLS) method to eliminate the Fresnel approximation error. In the second stage, we leverage the unique capability of HRIS in simultaneous sensing and reflection to estimate the HRIS-to-base station (BS) direction vectors using atomic norm minimization (ANM), and derive the three-dimensional (3-D) HRIS position with two BSs via the least squares (LS)-based geometric triangulation. Furthermore, we propose a semidefinite relaxation (SDR)-based HRIS phase optimization method to enhance the received signal power at the BSs, thereby improving the HRIS localization accuracy, which, in turn, enhances NF target positionings. The Cramer-Rao bound (CRB) for the NF target parameters and the position error bound (PEB) for the HRIS coordinates are derived as performance benchmarks.
Abstract:Deep-learning (DL)-based precoding in multi-user multiple-input single-output (MU-MISO) systems involves training DL models to map features derived from channel coefficients to labels derived from precoding weights. Traditionally, complex-valued channel and precoder coefficients are parameterized using either their real and imaginary components or their amplitude and phase. However, precoding performance depends on magnitudes of inner products between channel and precoding vectors, which are invariant to global phase rotations. Conventional representations fail to exploit this symmetry, leading to inefficient learning and degraded generalization. To address this, we propose a DL framework based on complex projective space (CPS) parameterizations of both the wireless channel and the weighted minimum mean squared error (WMMSE) precoder vectors. By removing the global phase redundancies inherent in conventional representations, the proposed framework enables the DL model to learn geometry-aligned and physically distinct channel-precoder mappings. Two CPS parameterizations based on real-valued embeddings and complex hyperspherical coordinates are investigated and benchmarked against two baseline methods. Simulation results demonstrate substantial improvements in sum-rate performance and generalization, with negligible increase in model complexity.
Abstract:Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level semantic reasoning layers operate above standardized control and data-plane functions. Although frontier-scale large language models (LLMs) such as Qwen2.5-7B and Olmo-3-7B demonstrate strong reasoning capability, their computational footprint limits deployment in latency-sensitive, edge-native infrastructures. This paper presents a systematic empirical study of the scaling behavior and deployment efficiency of compact language models for network-level semantic reasoning in AI-native 6G systems. Using 6G-Bench, a standardization-aligned benchmark comprising 30 decision-making tasks across five capability domains, we evaluate models ranging from 135M (SmolLM2-135M) to 7B parameters (Qwen2.5-7B), including mid-scale architectures such as Llama-3.2-1B, Granite-1B, and Qwen2.5-3B. Deterministic accuracy (pass@1) increases from 0.224 at 135M to 0.707 at 7B, but scaling gains are highly non-uniform. A pronounced stability transition occurs in the 1 to 1.5B range, where accuracy rises from 0.373 (Llama-3.2-1B) to 0.531 (Qwen2.5-1.5B) and the instability gap Delta_5 contracts from 0.356 to 0.138. Beyond 3B parameters, improvements diminish (+0.064 from 3B to 7B). Through single-query inference profiling and an Edge Score metric that normalizes accuracy by latency and memory footprint, we show that semantic reliability per unit edge resource does not scale monotonically with parameter count. Instead, mid-scale models (approximately 1.5 to 3B) achieve the most favorable balance between deterministic stability and computational efficiency, providing deployment-relevant guidance for AI-native 6G architectures. All scripts and results are publicly available at https://github.com/maferrag/6G-Bench
Abstract:This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets). While auction-based mechanisms have been widely employed for efficient resource allocation, most prior works assume one-shot auctions, static bidder behavior, and idealized conditions. In contrast to traditional formulations where base station (BS) association and power allocation are centrally optimized, we propose a distributed auction-based framework in which each BS independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values. Within this setting, UEs operate under budget constraints and repeated interactions, transforming resource allocation into a long-term economic decision rather than a one-shot optimization problem. The proposed framework enables the evaluation of diverse bidding behaviors -from classical myopic and greedy policies to LLM-based agents capable of reasoning over historical outcomes, anticipating competition, and adapting their bidding strategy across episodes. Simulation results reveal that the LLM-empowered UE consistently achieves higher channel access frequency and improved budget efficiency compared to benchmarks. These findings highlight the potential of reasoning-enabled agents in future decentralized wireless networks markets and pave the way for lightweight, edge-deployable LLMs to support intelligent resource allocation in next-generation HetNets.
Abstract:Open RAN (O-RAN) exposes rich control and telemetry interfaces across the Non-RT RIC, Near-RT RIC, and distributed units, but also makes it harder to operate multi-tenant, multi-objective RANs in a safe and auditable manner. In parallel, agentic AI systems with explicit planning, tool use, memory, and self-management offer a natural way to structure long-lived control loops. This article surveys how such agentic controllers can be brought into O-RAN: we review the O-RAN architecture, contrast agentic controllers with conventional ML/RL xApps, and organise the task landscape around three clusters: network slice life-cycle, radio resource management (RRM) closed loops, and cross-cutting security, privacy, and compliance. We then introduce a small set of agentic primitives (Plan-Act-Observe-Reflect, skills as tool use, memory and evidence, and self-management gates) and show, in a multi-cell O-RAN simulation, how they improve slice life-cycle and RRM performance compared to conventional baselines and ablations that remove individual primitives. Security, privacy, and compliance are discussed as architectural constraints and open challenges for standards-aligned deployments. This framework achieves an average 8.83\% reduction in resource usage across three classic network slices.
Abstract:Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications. However, the massive parameter scale and computational demands of LAIMs pose significant challenges for resource-limited embodied agents. To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems. First, we develop a tractable approximation for quantization-induced inference distortion. Based on this approximation, we derive lower and upper bounds on the quantization rate-inference distortion function, characterizing its dependence on LAIM statistics, including the quantization bit-width. Next, we formulate a joint quantization bit-width and computation frequency design problem under delay and energy constraints, aiming to minimize the distortion upper bound while ensuring tightness through the corresponding lower bound. Extensive evaluations validate the proposed distortion approximation, the derived rate-distortion bounds, and the effectiveness of the proposed joint design. Particularly, simulations and real-world testbed experiments demonstrate the effectiveness of the proposed joint design in balancing inference quality, latency, and energy consumption in edge embodied AI systems.