Abstract:Affine frequency division multiplexing~(AFDM) has emerged as a compelling waveform candidate for future wireless networks, owing to its strong resilience to doubly selective channels and its ability to enable the seamless integration of communication and sensing functionalities. Against this context, this article provides a systematic study of AFDM from a standardization perspective. We first introduce the principles of AFDM and discuss the major considerations involved in waveform standardization. We then examine the backwards compatibility of AFDM with 4G/5G multi-numerology frameworks and their anticipated evolution, frequency-modulated continuous-wave (FMCW) radar waveforms, and long-range (LoRa) modulation, demonstrating that AFDM can be incorporated into legacy processing chains with limited modification. Key standardization-critical capabilities are further discussed, including multiple-antenna and multi-user support, and peak-to-average power ratio (PAPR). Finally, we investigate the potential of AFDM in several emerging scenarios, including non-terrestrial networks~(NTN), integrated sensing and communications (ISAC), vehicle-to-everything (V2X), and underwater acoustic (UWA) communications, whereby severe delay-Doppler dispersion places stringent demands on waveform robustness. Through these explorations, it is shown that that AFDM represents a timely and compelling technology for future wireless networks.
Abstract:Driven by the massive video transmission requirements in the Internet of Everything, semantic communication holds great promise for striking a balance between transmission efficiency and quality. This paper introduces a large-model-driven generative video semantic communication (LGVSC) framework, enabling efficient video semantic transmission under extremely low bandwidth conditions. First, by decoupling the encoder and decoder as well as exposing explicit intermediate semantic representations, LGVSC maintains interpretability, avoiding the black-box behavior commonly observed in end-to-end systems. Next, we introduce a new metric, i.e., the probability-based semantic similarity score (PSSS), which quantifies semantic similarity for complex modalities within a continuous range, allowing for more precise evaluation of semantic content. Building on PSSS, we propose a semantic-guided keyframe extraction module driven by a multimodal large model. This module can enhance fine-grained semantic consistency during keyframe selection at the transmitter, optimizing transmission bandwidth without compromising semantic fidelity. Additionally, we design a generative large-model-driven dynamic semantic-adaptive decoder at the receiver, which can adapt to videos of arbitrary lengths. Simulation results demonstrate that LGVSC significantly outperforms traditional schemes, achieving a channel bandwidth ratio on the order of 10^-4 to 10^-3, while maintaining strong zero-shot generalization across downstream tasks.
Abstract:Extreme data scarcity and inherent multipath spatial ambiguity severely limit existing deep learning-based channel state information (CSI) fingerprinting localization schemes for target unmanned aerial vehicles (UAVs). To overcome these challenges, we propose an end-to-end semi-supervised generative localization framework. First, by exploiting the temporal correlations inherent in continuous flight trajectories, a self-supervised encoder extracts robust spatial features from massive unlabeled CSI sequences to establish structured latent representations. Following this, we utilize a consistency model, a powerful derivative of diffusion architectures, as the core generative backbone to map the learned latent space to physical coordinates, jointly fine-tuning the pre-trained encoder with a strictly limited set of labeled CSI. This consistency formulation models the conditional distribution to resolve the mean collapse problem of discriminative models, while compressing the inference trajectory to 1-2 steps to avoid the latency bottleneck of traditional diffusion models. Furthermore, a lightweight distributed fusion mechanism is designed to aggregate spatial predictions across multiple base stations (BS) from a multi-view geometry perspective. Comprehensive evaluations on a real-world measurement dataset demonstrate that our framework achieves low latency and suppresses the mean localization error to 9.77 cm under a 3-BS fusion setup with only a 1\% label fraction, significantly outperforming existing fully supervised and semi-supervised discriminative baselines.
Abstract:Cell-free massive multiple-input multiple-output (CF-mMIMO) systems provide enhanced coverage and capacity for next-generation wireless networks. However, CF-mMIMO systems face significant challenges in downlink power allocation (PA) due to imperfect channel state information (CSI), severe multi-user interference (MUI), and high computational complexity. To address these issues, rate-splitting multiple access (RSMA) is adopted as a robust interference management strategy. Accordingly, this paper proposes an unsupervised and scalable graph neural network (GNN) framework for PA in rate-splitting CF-mMIMO (RS-CF-mMIMO) systems, relying exclusively on large-scale fading (LSF) coefficients without instantaneous CSI. To resolve the dimensionality mismatch in dynamic networks, we introduce a slice-based adaptive layer that projects variable-dimension features into a fixed latent space. This mechanism enables a unified model to generalize across diverse topologies without retraining. Within this architecture, the sum spectral efficiency (SE) is maximized under per-AP power constraints, assuming maximum-ratio precoding for common streams and regularized zero-forcing precoding for private streams. We also derive a weighted minimum mean-square error-alternating direction method of multipliers (WMMSE-ADMM) algorithm as a performance upper bound. Extensive simulations verify that the proposed GNN framework achieves near-optimal SE and outperforms unsupervised deep neural networks (DNNs) across diverse system sizes and pilot assignment schemes. Furthermore, the scalable variant maintains robust performance while reducing the trainable parameter count by over 57% relative to DNNs and decreasing inference latency by up to three orders of magnitude compared with WMMSE-ADMM.
Abstract:Parsing chemical reaction diagrams from scientific literature is challenging due to heterogeneous layouts, intertwined visual elements, and the difficulty of integrating recognition and reasoning. Existing vision-language models advance multimodal understanding but still fail on complex diagrams, struggling to maintain spatial coherence and to integrate multidimensional information during reasoning. To address these issues, we propose MACReD, a hierarchical multi-agent framework that coordinates specialized agents for molecular perception, arrow understanding, text extraction, and reaction reconstruction within a unified VLM-guided architecture. The planning and perception layers use flexible, fine-grained detection to handle visual complexity, while the reasoning layer uses a multigraph fusion mechanism to integrate heterogeneous cues and enforce chemically consistent global reasoning. Experiments on the RxnScribe benchmark show that MACReD achieves state-of-the-art performance, with F1 scores of 75.2% and 84.6% under hard and soft match criteria, outperforming the RxnScribe baseline, which obtains 69.1% and 80.0%, respectively. These results demonstrate the robustness of MACReD across diverse diagram layouts, including multi-step and tree-structured reactions.
Abstract:In this paper, the waveform design for 6G integrated sensing and communication (ISAC) systems is investigated, with a particular focus on the practical limitations imposed by imperfect full-duplex radios. Under such imperfections, continuous communication waveforms, such as OFDM, suffer from severe full-duplex residual self-interference (RSI) for radar sensing, which significantly restricts the long-range sensing capabilities required by emerging low-altitude wireless networks (LAWN). To address this challenge, we propose a novel time-division ISAC waveform that integrates a specially developed dual-power phase-coded pulse for sensing into the communication frame under full-duplex RSI. Specifically, the dual-power sensing pulse consists of a high-power sequence followed by a low-power sequence, effectively exploiting imperfect full-duplex operations to achieve reliable long-range sensing while eliminating the detection blind range inherent to conventional half-duplex pulse radars. Furthermore, a complementary and inverse-phase sequence group is designed to ensure perfect autocorrelation and robust cross-correlation sidelobe suppression, so as to enhance multi-target detection capability. As for sensing signal processing, a parameterized mismatched filter is developed and optimized to maximize the detection performance, tailored to the proposed pulse structure. In addition, we design a hierarchical one-dimensional CFAR-CA detector that can exploit the perfect range-domain autocorrelation characteristics of the proposed waveform to further improve the detection performance. Extensive simulations demonstrate that the proposed design significantly improves the maximum detection range and multi-target detection capability compared to existing OFDM and LFM pulse baselines, while effectively covering the blind range for targets with small RCS.
Abstract:High-mobility uncrewed aerial vehicle (UAV) communications in low-altitude wireless networks (LAWN) demand reliable beamforming, while conventional feedback-based schemes suffer from excessive overhead and severe misalignment under rapid trajectory variations. To address this challenge, this paper proposes an SSB-based sensing-assisted predictive robust beamforming framework that replaces explicit channel state information (CSI) feedback with sensing-driven state estimation and uncertainty-aware optimization. Leveraging the periodic 'always-on' synchronization signal block (SSB), a hierarchical sensing algorithm tailored for hybrid digital-analog uniform planar arrays is developed, combining 2D range-velocity profiling and augmented beamspace multiple signal classification (MUSIC). By integrating a locally-focused analog receive beamformer, the proposed sensing design can ensure energy accumulates across different radio-frequency (RF) chains while resolving angular ambiguity. An extended Kalman filter (EKF) is further employed to track UAV states between sparse synchronization-signal (SS) bursts, and a covariance correction is introduced to characterize maneuver-induced prediction uncertainties. Based on the derived statistical distributions of range and angular parameters, the communication channel is modeled through predictive correlation matrices rather than instantaneous CSI, leading to a multi-user robust beamforming formulation that maximizes average network sum-rate under uncertainty. The resulting nonconvex problem is efficiently solved via successive convex approximation and alternating minimization. Simulation results demonstrate that the proposed framework significantly enhances spectral efficiency and link stability compared with feedback-based beamforming and non-robust beamforming design, particularly in high-mobility and large-SSB-interval scenarios.
Abstract:Affine frequency division multiplexing (AFDM), an emerging multi-carrier modulation scheme, has garnered significant attention due to its resilience to Doppler shifts and capability to achieve full diversity in doubly dispersive channels. However, existing data detection algorithms for AFDM systems face a significant trade-off between computational complexity and accuracy. In this paper, a novel low-complexity data detection scheme, termed the soft-feedback detector (SFD), is proposed. Particularly, building upon a maximum ratio combining (MRC) estimator framework, the SFD leverages the a priori symbol distribution to mitigate error propagation during iterative detection. Specifically, soft-decision feedback is incorporated as extrinsic information derived from the log-likelihood ratios of the transmitted symbols. As a result, the proposed detector significantly enhances detection accuracy while maintaining low computational complexity. Simulation results demonstrate that the SFD consistently outperforms benchmark decision-feedback detectors. In particular, compared with the conventional MRC detector, the proposed scheme achieves approximately a 3 dB signal-to-noise ratio (SNR) gain at the bit error rate (BER) of $10^{-3}$.
Abstract:Modern digital services have evolved into indispensable tools, driving the present large-scale information systems. Yet, the prevailing platform-centric model, where services are optimized for platform-driven metrics such as engagement and conversion, often fails to align with users' true needs. While platform technologies have advanced significantly-especially with the integration of large language models (LLMs)-we argue that improvements in platform service quality do not necessarily translate to genuine user benefit. Instead, platform-centric services prioritize provider objectives over user welfare, resulting in conflicts against user interests. This paper argues that the future of digital services should shift from a platform-centric to a user-centric agent. These user-centric agents prioritize privacy, align with user-defined goals, and grant users control over their preferences and actions. With advancements in LLMs and on-device intelligence, the realization of this vision is now feasible. This paper explores the opportunities and challenges in transitioning to user-centric intelligence, presents a practical device-cloud pipeline for its implementation, and discusses the necessary governance and ecosystem structures for its adoption.
Abstract:In wideband near-field arrays, frequency-dependent array responses cause wavefronts at different frequencies to deviate from that at the center frequency, producing beam squint and degrading multi-user performance. True-time-delay (TTD) circuits can realign the frequency dependence but require large delay ranges and intricate calibration, limiting scalability. Another line of work explores one- and two-dimensional array geometries, including linear, circular, and concentric circular, that exhibit distinct broadband behaviors such as different beam-squint sensitivities and focusing characteristics. These observations motivate adapting the array layout to enable wideband-friendly focusing and enhance multi-user performance without TTD networks. We propose a movable antenna (MA) aided architecture based on hierarchical sub-connected hybrid beamforming (HSC-HBF) in which antennas are grouped into tiles and only the tile centers are repositioned, providing slow geometric degrees of freedom that emulate TTD-like broadband focusing while keeping hardware and optimization complexity low. We show that the steering vector is inherently frequency dependent and that reconfiguring tile locations improves broadband focusing. Simulations across wideband near-field scenarios demonstrate robust squint suppression and consistent gains over fixed-layout arrays, achieving up to 5\% higher sum rate, with the maximum improvement exceeding 140\%.