National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
Abstract:This paper investigates narrowband coordinated user scheduling in multi-cell massive multiple-input multiple-output (MIMO) systems. We formulate the problem under a spectral-efficiency maximization criterion, revealing inherent challenges in computational complexity and signaling overhead. To address these, we develop a user-scheduling-oriented CKM (US-CKM) and a US-CKM-driven two-stage coordinated scheduling framework. By exploiting the mapping between location information and statistical channel state information (SCSI), the system enables rapid SCSI retrieval and persistent reuse, substantially reducing CSI acquisition overhead. Embedding statistical channel correlation into the CKM further characterizes interuser interference patterns. The framework designs an intra-cell active-user selection scheme for the first stage and an inter-cell coordinated scheduling scheme for the second, both based on US-CKM entries. The first stage identifies users with favorable channel gains and low intra-cell interference, reducing the candidate set with marginal sum-rate loss. The second stage suppresses inter-cell interference (ICI) by exploiting cross-cell channel correlations. To enhance robustness against imperfect SCSI in dynamic scattering environments, we augment the framework with a reliability-guided mechanism. Instead of uniform treatment, we evaluate entry stability using a grid reliability metric quantifying channel measurement variance at sampling locations. Low-reliability grids are identified, and their instantaneous CSI is acquired in real time to integrate with existing SCSI. This process refines channel gain and spatial correlation characteristics, ensuring robust performance under imperfect conditions.
Abstract:In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...
Abstract:Accurate beam prediction is a key enabler for next-generation wireless communication systems. In this paper, we propose a multimodal large language model (LLM)-based beam prediction framework that effectively utilizes contextual information, provided by sensory data including RGB camera images and LiDAR point clouds. To effectively fuse heterogeneous modalities, we design specialized modality encoders together with a beam-guided attention masking mechanism and a high-frequency temporal alignment strategy, enabling robust cross-modal feature integration under dynamic environments. Furthermore, we construct a large-scale multimodal dataset for communication, named Multimodal-Wireless, which covers diverse weather and traffic conditions with high-fidelity ray-tracing labels. Extensive simulation results demonstrate that the proposed approach significantly reduces the reliance on oracle angle-of-departure knowledge and consistently outperforms state-of-the-art multimodal LLM-based beam prediction methods in terms of beam accuracy and communication performance, improving the average Top-1 accuracy to 80.8% and the average normalized gain to 89.1%.
Abstract:Deep generative models offer a powerful alternative to conventional channel estimation by learning complex channel distributions. By integrating the rich environmental information available in modern sensing-aided networks, this paper proposes MultiCE-Flow, a multimodal channel estimation framework based on flow matching and diffusion transformer (DiT). We design a specialized multimodal perception module that fuses LiDAR, camera, and location data into a semantic condition, while treating sparse pilots as a structural condition. These conditions guide a DiT backbone to reconstruct high-fidelity channels. Unlike standard diffusion models, we employ flow matching to learn a linear trajectory from noise to data, enabling efficient one-step sampling. By leveraging environmental semantics, our method mitigates the ill-posed nature of estimation with sparse pilots. Extensive experiments demonstrate that MultiCE-Flow consistently outperforms traditional baselines and existing generative models. Notably, it exhibits superior robustness to out-of-distribution scenarios and varying pilot densities, making it suitable for environment-aware communication systems.
Abstract:Robust and accurate navigation is critical for Unmanned Aerial Vehicles (UAVs) especially for those with stringent Size, Weight, and Power (SWaP) constraints. However, most state-of-the-art (SOTA) LiDAR-Inertial Odometry (LIO) systems still suffer from estimation inconsistency and computational bottlenecks when deployed on such platforms. To address these issues, this paper proposes a consistent and efficient tightly-coupled LIO framework tailored for UAVs. Within the efficient Multi-State Constraint Kalman Filter (MSCKF) framework, we build coplanar constraints inferred from planar features observed across a sliding window. By applying null-space projection to sliding-window coplanar constraints, we eliminate the direct dependency on feature parameters in the state vector, thereby mitigating overconfidence and improving consistency. More importantly, to further boost the efficiency, we introduce a parallel voxel-based data association and a novel compact cluster-to-plane measurement model. This compact measurement model losslessly reduces observation dimensionality and significantly accelerating the update process. Extensive evaluations demonstrate that our method outperforms most state-of-the-art (SOTA) approaches by providing a superior balance of consistency and efficiency. It exhibits improved robustness in degenerate scenarios, achieves the lowest memory usage via its map-free nature, and runs in real-time on resource-constrained embedded platforms (e.g., NVIDIA Jetson TX2).
Abstract:Satellite communications face severe bottlenecks in supporting high-fidelity synchronized audiovisual services, as conventional schemes struggle with cross-modal coherence under fluctuating channel conditions, limited bandwidth, and long propagation delays. To address these limitations, this paper proposes an adaptive multimodal semantic transmission system tailored for satellite scenarios, aiming for high-quality synchronized audiovisual reconstruction under bandwidth constraints. Unlike static schemes with fixed modal priorities, our framework features a dual-stream generative architecture that flexibly switches between video-driven audio generation and audio-driven video generation. This allows the system to dynamically decouple semantics, transmitting only the most important modality while employing cross-modal generation to recover the other. To balance reconstruction quality and transmission overhead, a dynamic keyframe update mechanism adaptively maintains the shared knowledge base according to wireless scenarios and user requirements. Furthermore, a large language model based decision module is introduced to enhance system adaptability. By integrating satellite-specific knowledge, this module jointly considers task requirements and channel factors such as weather-induced fading to proactively adjust transmission paths and generation workflows. Simulation results demonstrate that the proposed system significantly reduces bandwidth consumption while achieving high-fidelity audiovisual synchronization, improving transmission efficiency and robustness in challenging satellite scenarios.
Abstract:This paper proposes an Adaptive-Growth Randomized Neural Network (AG-RaNN) method for computing multivalued solutions of nonlinear first-order PDEs with hyperbolic characteristics, including quasilinear hyperbolic balance laws and Hamilton--Jacobi equations. Such solutions arise in geometric optics, seismic waves, semiclassical limit of quantum dynamics and high frequency limit of linear waves, and differ markedly from the viscosity or entropic solutions. The main computational challenges lie in that the solutions are no longer functions, and become union of multiple branches, after the formation of singularities. Level-set formulations offer a systematic alternative by embedding the nonlinear dynamics into linear transport equations posed in an augmented phase space, at the price of substantially increased dimensionality. To alleviate this computational burden, we combine AG-RaNN with an adaptive collocation strategy that concentrates samples in a tubular neighborhood of the zero level set, together with a layer-growth mechanism that progressively enriches the randomized feature space. Under standard regularity assumptions on the transport field and the characteristic flow, we establish a convergence result for the AG-RaNN approximation of the level-set equations. Numerical experiments demonstrate that the proposed method can efficiently recover multivalued structures and resolve nonsmooth features in high-dimensional settings.
Abstract:This paper investigates the power control problem in wireless networks by repurposing pre-trained large language models (LLMs) as relational reasoning backbones. In hyper-connected interference environments, traditional optimization methods face high computational cost, while standard message passing neural networks suffer from aggregation bottlenecks that can obscure critical high-interference structures. In response, we propose PC-LLM, a physics-informed framework that augments a pre-trained Transformer with an interference-aware attention bias. The proposed bias tuning mechanism injects the physical channel gain matrix directly into the self-attention logits, enabling explicit fusion of wireless topology with pre-trained relational priors without retraining the backbone from scratch. Extensive experiments demonstrate that PC-LLM consistently outperforms both traditional optimization methods and state-of-the-art graph neural network baselines, while exhibiting exceptional zero-shot generalization to unseen environments. We further observe a structural-semantic decoupling phenomenon: Topology-relevant relational reasoning is concentrated in shallow layers, whereas deeper layers encode task-irrelevant semantic noise. Motivated by this finding, we develop a lightweight adaptation strategy that reduces model depth by 50\%, significantly lowering inference cost while preserving state-of-the-art spectral efficiency.
Abstract:Millimeter-wave massive multiple-input multiple-output systems employ highly directional beamforming to overcome severe path loss, and their performance critically depends on accurate beam alignment. Conventional codebook-based methods offer low training overhead but suffer from limited angular resolution and sensitivity to hardware impairments. To address these challenges, we propose a deep learning-enhanced super-resolution beam alignment framework with three key components. First, we design the Quaternary Search-based Super-Resolution (QSSR) algorithm, which leverages the monotonic power ratio property between two discrete Fourier transform (DFT) codebook beams to achieve super-resolution angle estimation without increasing measurement complexity relative to binary search. Second, we develop QSSR-Net, a gated recurrent unit-based neural network that exploits sequential multi-layer beam measurements to capture angular dependencies, thereby improving estimation accuracy, robustness to noise, and generalization across diverse propagation environments. Third, to mitigate the adverse effects of hardware impairments such as antenna position and phase errors, we propose a parametric self-calibration method that requires no additional hardware overhead and adapts compensation parameters in real time. Simulation results show that the proposed framework consistently outperforms binary search and even exhaustive search at high signal-to-noise ratios, achieving substantial performance gains while maintaining low overhead.
Abstract:The joint optimization of the integer matrix $\mathbf{A}$ and the power scaling matrix $\mathbf{D}$ is central to achieving the capacity-approaching performance of Integer-Forcing (IF) precoding. This problem, however, is known to be NP-hard, presenting a fundamental computational bottleneck. In this paper, we reveal that the solution space of this problem admits a intrinsic geometric structure: it can be partitioned into a finite number of conical regions, each associated with a distinct full-rank integer matrix $\mathbf{A}$. Leveraging this decomposition, we transform the NP-hard problem into a search over these regions and propose the Multi-Cone Nested Stochastic Pattern Search (MCN-SPS) algorithm. Our main theoretical result is that MCN-SPS finds a near-optimal solution with a computational complexity of $\mathcal{O}\left(K^4\log K\log_2(r_0)\right)$, which is polynomial in the number of users $K$. Numerical simulations corroborate the theoretical analysis and demonstrate the algorithm's efficacy.