Abstract:Acquiring the channel state information from limited and noisy observations at pilot positions is critical for wireless multiple-input multiple-output (MIMO)-orthogonal frequency division multiplexing (OFDM) systems. In this paper, we view this process as a conditional generative task in which the partial noisy channel estimates at the pilots are utilized as a ``prompt'' to guide the diffusion ``inpainting'' of the underlying channel. To this end, we resort to a general Conditional Diffusion Transformer (CDiT) framework with a well-designed network architecture and update rule. In particular, we design a dedicated embedding strategy to encode and adapt to different pilot patterns and noise levels, and utilize a special cross-attention mechanism to align the partial raw channel observations with the denoised channel at each time step of the generation process. This architecture effectively anchors the diffusion process, enabling the model to accurately recover full channel details from limited noisy observations. Comprehensive experimental results show that, the proposed approach achieves a performance gain of over 5 dB compared to the baselines under varying noise conditions, and provides robust channel acquisition even under a sparse pilot density of 1/32 without significant performance loss compared to the denser pilot cases. Moreover, it is capable of generating high-quality channel matrices within just 10 inference steps, effectively balancing estimation accuracy with computational efficiency and inference speed. Ablation studies demonstrate the rationality of the model design and the necessity of its modules.
Abstract:Digital twin (DT) technology offers transformative potential for vehicular networks, enabling high-fidelity virtual representations for enhanced safety and automation. However, seamless DT synchronization in dynamic environments faces challenges such as massive data transmission, precision sensing, and strict computational constraints. This paper proposes an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for DT-assisted vehicular networks in the near-field (NF) regime. Leveraging a multi-user multiple-input multiple-output (MU-MIMO) configuration, each roadside unit (RSU) employs semantic communication to serve vehicles while simultaneously utilizing millimeter-wave (mmWave) radar for environmental mapping. We implement particle filtering at RSUs to achieve high-precision vehicle tracking. To optimize performance, we formulate a joint optimization problem balancing semantic communication rates and sensing accuracy under limited computational resources and power budget. Our solution includes a hybrid heuristic algorithm for vehicle-to-RSU assignment and an alternating optimization approach for determining semantic extraction ratios and beamforming matrices. Performance is extensively evaluated via the Cramér-Rao bound (CRB) for angle and distance estimation, semantic transmission rates, and resource utilization. Numerical results demonstrate that the proposed ISCSC framework achieves a 20% improvement in transmission rate while maintaining the sensing accuracy of existing integrated sensing and communication (ISAC) schemes under constrained resource conditions.
Abstract:Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a generation-verification asymmetry: while high-quality generation requires large models, effective oversight can often be achieved by smaller ones. We further introduce a data generation pipeline that produces supervision signals for critic fine-tuning without modifying the actor. Experiments on $τ$-bench and UserBench show that our approach significantly improves reliability and task success over strong single-agent baselines. Moreover, lightweight open-source critics rival or surpass larger proprietary models in the critic role, and critic fine-tuning yields additional gains over several state-of-the-art methods.
Abstract:In this paper, we investigate the performance of a fluid antenna relay (FAR)-assisted downlink communication system utilizing non-orthogonal multiple access (NOMA). The FAR, which integrates a fluid antenna system (FAS), is equipped on an autonomous aerial vehicle (AAV), and introduces extra degrees of freedom to improve the performance of the system. The transmission is divided into a first phase from the base station (BS) to the users and the FAR, and a second phase where the FAR forwards the signal using amplify-and-forward (AF) or decode-and-forward (DF) relaying to reduce the outage probability (OP) for the user maintaining weaker channel conditions. To analyze the OP performance of the weak user, Copula theory and the Gaussian copula function are employed to model the statistical distribution of the FAS channels. Analytical expressions for weak user's OP are derived for both the AF and the DF schemes. Simulation results validate the effectiveness of the proposed scheme, showing that it consistently outperforms benchmark schemes without the FAR. In addition, numerical simulations also demonstrate the values of the relaying scheme selection parameter under different FAR positions and communication outage thresholds.
Abstract:Learning-based semantic communication (SemCom) has recently emerged as a promising paradigm for improving the transmission efficiency of wireless networks. However, existing methods typically rely on extensive end-to-end training, which is both inflexible and computationally expensive in dynamic wireless environments. Moreover, they fail to exploit redundancy across multiple transmissions of semantically similar content, limiting overall efficiency. To overcome these limitations, we propose a channel-aware generative adversarial network (GAN) inversion-based joint source-channel coding (CAGI-JSCC) framework that enables training-free SemCom by leveraging a pre-trained SemanticStyleGAN model. By explicitly incorporating wireless channel characteristics into the GAN inversion process, CAGI-JSCC adapts to varying channel conditions without additional training. Furthermore, we introduce a cache-enabled dynamic codebook (CDC) that caches disentangled semantic components at both the transmitter and receiver, allowing the system to reuse previously transmitted content. This semantic-level caching can continuously reduce redundant transmissions as experience accumulates. Extensive experiments on image transmission demonstrate the effectiveness of the proposed framework. In particular, our system achieves comparable perceptual quality with an average bandwidth compression ratio (BCR) of 1/224, and as low as 1/1024 for a single image, significantly outperforming baselines with a BCR of 1/128.
Abstract:Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting $\leq$ 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windows for coherent synthesis; (2) a cube face context management mechanism, equipped with a sparse context attention design to improve efficiency; and (3) continuity-aware techniques, including cube-aware positional encoding, padding, and blending to eliminate boundary seams. Extensive experiments on benchmark datasets demonstrate that CubeComposer outperforms state-of-the-art methods in native resolution and visual quality, supporting practical VR application scenarios. Project page: https://lg-li.github.io/project/cubecomposer
Abstract:Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combined with RL, these approaches often suffer from distribution mismatch and objective interference: teacher supervision may not align with the student's evolving rollout distribution, and the KL regularizer can compete with reward maximization and require careful loss balancing. To address these issues, we propose RL-aware distillation (RLAD), which performs selective imitation during RL -- guiding the student toward the teacher only when it improves the current policy update. Our core component, Trust Region Ratio Distillation (TRRD), replaces the teacher-student KL regularizer with a PPO/GRPO-style likelihood-ratio objective anchored to a teacher--old-policy mixture, yielding advantage-aware, trust-region-bounded distillation on student rollouts and naturally balancing exploration, exploitation, and imitation. Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.
Abstract:It is commonly believed that gradient compression in federated learning (FL) enjoys significant improvement in communication efficiency with negligible performance degradation. In this paper, we find that gradient compression induces sharper loss landscapes in federated learning, particularly under non-IID data distributions, which suggests hindered generalization capability. The recently emerging Sharpness Aware Minimization (SAM) effectively searches for a flat minima by incorporating a gradient ascent step (i.e., perturbing the model with gradients) before the celebrated stochastic gradient descent. Nonetheless, the direct application of SAM in FL suffers from inaccurate estimation of the global perturbation due to data heterogeneity. Existing approaches propose to utilize the model update from the previous communication round as a rough estimate. However, its effectiveness is hindered when model update compression is incorporated. In this paper, we propose FedSynSAM, which leverages the global model trajectory to construct synthetic data and facilitates an accurate estimation of the global perturbation. The convergence of the proposed algorithm is established, and extensive experiments are conducted to validate its effectiveness.
Abstract:Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction ($x$-prediction). In this work, we investigate the transfer of this paradigm to binary manifolds, a fundamental setting for generative modeling of discrete data. While $x$-prediction remains effective, we identify a latent structural mismatch that arises when it is coupled with velocity-based objectives ($v$-loss), leading to a time-dependent singular weighting that amplifies gradient sensitivity to approximation errors. Motivated by this observation, we formalize prediction-loss alignment as a necessary condition for flow matching training. We prove that re-aligning the objective to the signal space ($x$-loss) eliminates the singular weighting, yielding uniformly bounded gradients and enabling robust training under uniform timestep sampling without reliance on heuristic schedules. Finally, with alignment secured, we examine design choices specific to binary data, revealing a topology-dependent distinction between probabilistic objectives (e.g., cross-entropy) and geometric losses (e.g., mean squared error). Together, these results provide theoretical foundations and practical guidelines for robust flow matching on binary -- and related discrete -- domains, positioning signal-space alignment as a key principle for robust diffusion learning.
Abstract:The integration of sensing and communication (ISAC) is an essential function of future wireless systems. Due to its large available bandwidth, millimeter-wave (mmWave) ISAC systems are able to achieve high sensing accuracy. In this paper, we consider the multiple base-station (BS) collaborative sensing problem in a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) mmWave communication system. Our aim is to sense a remote target shape with the collected signals which consist of both the reflection and scattering signals. We first characterize the mmWave's scattering and reflection effects based on the Lambertian scattering model. Then we apply the periodogram technique to obtain rough scattering point detection, and further incorporate the subspace method to achieve more precise scattering and reflection point detection. Based on these, a reconstruction algorithm based on Hough Transform and principal component analysis (PCA) is designed for a single convex polygon target scenario. To improve the accuracy and completeness of the reconstruction results, we propose a method to further fuse the scattering and reflection points. Extensive simulation results validate the effectiveness of the proposed algorithms.