Fellow, IEEE
Abstract:The transformative power of artificial intelligence (AI) and machine learning (ML) is recognized as a key enabler for sixth generation (6G) mobile networks by both academia and industry. Research on AI/ML in mobile networks has been ongoing for years, and the 3rd generation partnership project (3GPP) launched standardization efforts to integrate AI into mobile networks. However, a comprehensive review of the current status and challenges of the standardization of AI/ML for mobile networks is still missing. To this end, we provided a comprehensive review of the standardization efforts by 3GPP on AI/ML for mobile networks. This includes an overview of the general AI/ML framework, representative use cases (i.e., CSI feedback, beam management and positioning), and corresponding evaluation matrices. We emphasized the key research challenges on dataset preparation, generalization evaluation and baseline AI/ML models selection. Using CSI feedback as a case study, given the test dataset 2, we demonstrated that the pre-training-fine-tuning paradigm (i.e., pre-training using dataset 1 and fine-tuning using dataset 2) outperforms training on dataset 2. Moreover, we observed the highest performance enhancements in Transformer-based models through fine-tuning, showing its great generalization potential at large floating-point operations (FLOPs). Finally, we outlined future research directions for the application of AI/ML in mobile networks.
Abstract:Beamforming (BF) is essential for enhancing system capacity in fifth generation (5G) and beyond wireless networks, yet exhaustive beam training in ultra-massive multiple-input multiple-output (MIMO) systems incurs substantial overhead. To address this challenge, we propose a deep learning based framework that leverages position-aware features to improve beam prediction accuracy while reducing training costs. The proposed approach uses spatial coordinate labels to supervise a position extraction branch and integrates the resulting representations with beam-domain features through a feature fusion module. A dual-branch RegNet architecture is adopted to jointly learn location related and communication features for beam prediction. Two fusion strategies, namely adaptive fusion and adversarial fusion, are introduced to enable efficient feature integration. The proposed framework is evaluated on datasets generated by the DeepMIMO simulator across four urban scenarios at 3.5 GHz following 3GPP specifications, where both reference signal received power and user equipment location information are available. Simulation results under both in-distribution and out-of-distribution settings demonstrate that the proposed approach consistently outperforms traditional baselines and achieves more accurate and robust beam prediction by effectively incorporating positioning information.
Abstract:Acquiring channel state information (CSI) through traditional methods, such as channel estimation, is increasingly challenging for the emerging sixth generation (6G) mobile networks due to high overhead. To address this issue, channel extrapolation techniques have been proposed to acquire complete CSI from a limited number of known CSIs. To improve extrapolation accuracy, environmental information, such as visual images or radar data, has been utilized, which poses challenges including additional hardware, privacy and multi-modal alignment concerns. To this end, this paper proposes a novel channel extrapolation framework by leveraging environment-related multi-path characteristics induced directly from CSI without integrating additional modalities. Specifically, we propose utilizing the multi-path characteristics in the form of power-delay profile (PDP), which is acquired using a CSI-to-PDP module. CSI-to-PDP module is trained in an AE-based framework by reconstructing the PDPs and constraining the latent low-dimensional features to represent the CSI. We further extract the total power & power-weighted delay of all the identified paths in PDP as the multi-path information. Building on this, we proposed a MAE architecture trained in a self-supervised manner to perform channel extrapolation. Unlike standard MAE approaches, our method employs separate encoders to extract features from the masked CSI and the multi-path information, which are then fused by a cross-attention module. Extensive simulations demonstrate that this framework improves extrapolation performance dramatically, with a minor increase in inference time (around 0.1 ms). Furthermore, our model shows strong generalization capabilities, particularly when only a small portion of the CSI is known, outperforming existing benchmarks.
Abstract:In the evolving landscape of sixth-generation (6G) mobile communication, multiple-input multiple-output (MIMO) systems are incorporating an unprecedented number of antenna elements, advancing towards Extremely large-scale multiple-input-multiple-output (XL-MIMO) systems. This enhancement significantly increases the spatial degrees of freedom, offering substantial benefits for wireless positioning. However, the expansion of the near-field range in XL-MIMO challenges the traditional far-field assumptions used in previous MIMO models. Among various configurations, uniform circular arrays (UCAs) demonstrate superior performance by maintaining constant angular resolution, unlike linear planar arrays. Addressing how to leverage the expanded aperture and harness the near-field effects in XL-MIMO systems remains an area requiring further investigation. In this paper, we introduce an attention-enhanced deep learning approach for precise positioning. We employ a dual-path channel attention mechanism and a spatial attention mechanism to effectively integrate channel-level and spatial-level features. Our comprehensive simulations show that this model surpasses existing benchmarks such as attention-based positioning networks (ABPN), near-field positioning networks (NFLnet), convolutional neural networks (CNN), and multilayer perceptrons (MLP). The proposed model achieves superior positioning accuracy by utilizing covariance metrics of the input signal. Also, simulation results reveal that covariance metric is advantageous for positioning over channel state information (CSI) in terms of positioning accuracy and model efficiency.
Abstract:To meet the requirements for managing unauthorized UAVs in the low-altitude economy, a multi-modal UAV trajectory prediction method based on the fusion of LiDAR and millimeter-wave radar information is proposed. A deep fusion network for multi-modal UAV trajectory prediction, termed the Multi-Modal Deep Fusion Framework, is designed. The overall architecture consists of two modality-specific feature extraction networks and a bidirectional cross-attention fusion module, aiming to fully exploit the complementary information of LiDAR and radar point clouds in spatial geometric structure and dynamic reflection characteristics. In the feature extraction stage, the model employs independent but structurally identical feature encoders for LiDAR and radar. After feature extraction, the model enters the Bidirectional Cross-Attention Mechanism stage to achieve information complementarity and semantic alignment between the two modalities. To verify the effectiveness of the proposed model, the MMAUD dataset used in the CVPR 2024 UG2+ UAV Tracking and Pose-Estimation Challenge is adopted as the training and testing dataset. Experimental results show that the proposed multi-modal fusion model significantly improves trajectory prediction accuracy, achieving a 40% improvement compared to the baseline model. In addition, ablation experiments are conducted to demonstrate the effectiveness of different loss functions and post-processing strategies in improving model performance. The proposed model can effectively utilize multi-modal data and provides an efficient solution for unauthorized UAV trajectory prediction in the low-altitude economy.
Abstract:With the development of the sixth-generation (6G) communication system, Channel State Information (CSI) plays a crucial role in improving network performance. Traditional Channel Charting (CC) methods map high-dimensional CSI data to low-dimensional spaces to help reveal the geometric structure of wireless channels. However, most existing CC methods focus on learning static geometric structures and ignore the dynamic nature of the channel over time, leading to instability and poor topological consistency of the channel charting in complex environments. To address this issue, this paper proposes a novel time-series channel charting approach based on the integration of Long Short-Term Memory (LSTM) networks and Auto encoders (AE) (LSTM-AE-CC). This method incorporates a temporal modeling mechanism into the traditional CC framework, capturing temporal dependencies in CSI using LSTM and learning continuous latent representations with AE. The proposed method ensures both geometric consistency of the channel and explicit modeling of the time-varying properties. Experimental results demonstrate that the proposed method outperforms traditional CC methods in various real-world communication scenarios, particularly in terms of channel charting stability, trajectory continuity, and long-term predictability.
Abstract:Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.
Abstract:Predicting pathloss by considering the physical environment is crucial for effective wireless network planning. Traditional methods, such as ray tracing and model-based approaches, often face challenges due to high computational complexity and discrepancies between models and real-world environments. In contrast, deep learning has emerged as a promising alternative, offering accurate path loss predictions with reduced computational complexity. In our research, we introduce a ResNet-based model designed to enhance path loss prediction. We employ innovative techniques to capture key features of the environment by generating transmission (Tx) and reception (Rx) depth maps, as well as a distance map from the geographic data. Recognizing the significant attenuation caused by signal reflection and diffraction, particularly at high frequencies, we have developed a weighting map that emphasizes the areas adjacent to the direct path between Tx and Rx for path loss prediction. {Extensive simulations demonstrate that our model outperforms PPNet, RPNet, and Vision Transformer (ViT) by 1.2-3.0 dB using dataset of ITU challenge 2024 and ICASSP 2023. In addition, the floating point operations (FLOPs) of the proposed model is 60\% less than those of benchmarks.} Additionally, ablation studies confirm that the inclusion of the weighting map significantly enhances prediction performance.
Abstract:Self-Supervised Learning (SSL) has emerged as a key technique in machine learning, tackling challenges such as limited labeled data, high annotation costs, and variable wireless channel conditions. It is essential for developing Channel Foundation Models (CFMs), which extract latent features from channel state information (CSI) and adapt to different wireless settings. Yet, existing CFMs have notable drawbacks: heavy reliance on scenario-specific data hinders generalization, they focus on single/dual tasks, and lack zero-shot learning ability. In this paper, we propose CSI-MAE, a generalized CFM leveraging masked autoencoder for cross-scenario generalization. Trained on 3GPP channel model datasets, it integrates sensing and communication via CSI perception and generation, proven effective across diverse tasks. A lightweight decoder finetuning strategy cuts training costs while maintaining competitive performance. Under this approach, CSI-MAE matches or surpasses supervised models. With full-parameter finetuning, it achieves the state-of-the-art performance. Its exceptional zero-shot transferability also rivals supervised techniques in cross-scenario applications, driving wireless communication innovation.
Abstract:CSI extrapolation is an effective method for acquiring channel state information (CSI), essential for optimizing performance of sixth-generation (6G) communication systems. Traditional channel estimation methods face scalability challenges due to the surging overhead in emerging high-mobility, extremely large-scale multiple-input multiple-output (EL-MIMO), and multi-band systems. CSI extrapolation techniques mitigate these challenges by using partial CSI to infer complete CSI, significantly reducing overhead. Despite growing interest, a comprehensive review of state-of-the-art (SOTA) CSI extrapolation techniques is lacking. This paper addresses this gap by comprehensively reviewing the current status, challenges, and future directions of CSI extrapolation for the first time. Firstly, we analyze the performance metrics specific to CSI extrapolation in 6G, including extrapolation accuracy, adaption to dynamic scenarios and algorithm costs. We then review both model-driven and artificial intelligence (AI)-driven approaches for time, frequency, antenna, and multi-domain CSI extrapolation. Key insights and takeaways from these methods are summarized. Given the promise of AI-driven methods in meeting performance requirements, we also examine the open-source channel datasets and simulators that could be used to train high-performance AI-driven CSI extrapolation models. Finally, we discuss the critical challenges of the existing research and propose perspective research opportunities.