Sherman
Abstract:The rapid advancement of communication technologies has driven the evolution of communication networks towards both high-dimensional resource utilization and multifunctional integration. This evolving complexity poses significant challenges in designing communication networks to satisfy the growing quality-of-service and time sensitivity of mobile applications in dynamic environments. Graph neural networks (GNNs) have emerged as fundamental deep learning (DL) models for complex communication networks. GNNs not only augment the extraction of features over network topologies but also enhance scalability and facilitate distributed computation. However, most existing GNNs follow a traditional passive learning framework, which may fail to meet the needs of increasingly diverse wireless systems. This survey proposes the employment of agentic artificial intelligence (AI) to organize and integrate GNNs, enabling scenario- and task-aware implementation towards edge general intelligence. To comprehend the full capability of GNNs, we holistically review recent applications of GNNs in wireless communications and networking. Specifically, we focus on the alignment between graph representations and network topologies, and between neural architectures and wireless tasks. We first provide an overview of GNNs based on prominent neural architectures, followed by the concept of agentic GNNs. Then, we summarize and compare GNN applications for conventional systems and emerging technologies, including physical, MAC, and network layer designs, integrated sensing and communication (ISAC), reconfigurable intelligent surface (RIS) and cell-free network architecture. We further propose a large language model (LLM) framework as an intelligent question-answering agent, leveraging this survey as a local knowledge base to enable GNN-related responses tailored to wireless communication research.
Abstract:Integrated Sensing and Communication (ISAC) technology plays a critical role in future intelligent transportation systems, by enabling vehicles to perceive and reconstruct the surrounding environment through reuse of wireless signals, thereby reducing or even eliminating the need for additional sensors such as LiDAR or radar. However, existing ISAC based reconstruction methods often lack the ability to track dynamic scenes with sufficient accuracy and temporal consistency, limiting the real world applicability. To address this limitation, we propose a deep learning based framework for vehicular environment reconstruction by using ISAC channels. We first establish a joint channel environment dataset based on multi modal measurements from real world urban street scenarios. Then, a multistage deep learning network is developed to reconstruct the environment. Specifically, a scene decoder identifies the environmental context such as buildings, trees and so on; a cluster center decoder predicts coarse spatial layouts by localizing dominant scattering centers; a point cloud decoder recovers fine grained geometry and structure of surrounding environments. Experimental results demonstrate that the proposed method achieves high-quality dynamic environment reconstruction with a Chamfer Distance of 0.29 and F Score@1% of 0.87. In addition, complexity analysis demonstrates the efficiency and practical applicability of the method in real time scenarios. This work provides a pathway toward low cost environment reconstruction based on ISAC for future intelligent transportation.
Abstract:Cell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges, this correspondence proposes a framework based on a sparse multi-dimensional graph neural network (SP-MDGNN), which sparsifies the connections between access points (APs) and user equipments (UEs) to significantly reduce computational complexity while maintaining high performance. In addition, the weighted minimum mean square error (WMMSE) algorithm is introduced as a comparative method to further analyze the trade-off between performance and complexity. Simulation results demonstrate that the sparse method achieves an optimal balance between performance and complexity, significantly reducing the computational complexity of the original MDGNN method while incurring only a slight performance degradation, providing insights for the practical deployment of CF mMIMO systems in large-scale network.
Abstract:To enable next-generation wireless communication networks with modest spectrum availability, multiple-input multiple-output (MIMO) technology needs to undergo further evolution. In this paper, we introduce a promising next-generation wireless communication concept: flexible MIMO technology. This technology represents a MIMO technology with flexible physical configurations and integrated applications. We categorize twelve representative flexible MIMO technologies into three major classifications: flexible deployment characteristics-based, flexible geometry characteristics-based, and flexible real-time modifications-based. Then, we provide a comprehensive overview of their fundamental characteristics, potential, and challenges. Furthermore, we demonstrate three vital enablers for the flexible MIMO technology, including efficient channel state information (CSI) acquisition schemes, low-complexity beamforming design, and explainable artificial intelligence (AI)-enabled optimization. Within these areas, eight critical sub-enabling technologies are discussed in detail. Finally, we present two case studies-pre-optimized irregular arrays and cell-free movable antennas-where significant potential for flexible MIMO technologies to enhance the system capacity is showcased.
Abstract:Recently, a novel flexible-antenna technology, called pinching antennas, has attracted growing academic interest. By inserting discrete dielectric materials, pinching antennas can be activated at arbitrary points along waveguides, allowing for flexible customization of large-scale path loss. This paper investigates a multi-waveguide pinching-antenna integrated sensing and communications (ISAC) system, where transmit pinching antennas (TPAs) and receive pinching antennas (RPAs) coordinate to simultaneously detect one potential target and serve one downlink user. We formulate a communication rate maximization problem subject to radar signal-to-noise ratio (SNR) requirement, transmit power budget, and the allowable movement region of the TPAs, by jointly optimizing TPA locations and transmit beamforming design. To address the non-convexity of the problem, we propose a novel fine-tuning approximation method to reformulate it into a tractable form, followed by a successive convex approximation (SCA)-based algorithm to obtain the solution efficiently. Extensive simulations validate both the system design and the proposed algorithm. Results show that the proposed method achieves near-optimal performance compared with the computational-intensive exhaustive search-based benchmark, and pinching-antenna ISAC systems exhibit a distinct communication-sensing trade-off compared with conventional systems.
Abstract:Developing generalist agents that can operate across diverse tasks, environments, and physical embodiments is a grand challenge in robotics and artificial intelligence. In this work, we focus on the axis of embodiment and investigate embodiment scaling laws$\unicode{x2013}$the hypothesis that increasing the number of training embodiments improves generalization to unseen ones. Using robot locomotion as a test bed, we procedurally generate a dataset of $\sim$1,000 varied embodiments, spanning humanoids, quadrupeds, and hexapods, and train generalist policies capable of handling diverse observation and action spaces on random subsets. We find that increasing the number of training embodiments improves generalization to unseen ones, and scaling embodiments is more effective in enabling embodiment-level generalization than scaling data on small, fixed sets of embodiments. Notably, our best policy, trained on the full dataset, zero-shot transfers to novel embodiments in the real world, such as Unitree Go2 and H1. These results represent a step toward general embodied intelligence, with potential relevance to adaptive control for configurable robots, co-design of morphology and control, and beyond.
Abstract:The stacked intelligent metasurface (SIM), comprising multiple layers of reconfigurable transmissive metasurfaces, is becoming an increasingly viable solution for future wireless communication systems. In this paper, we explore the integration of SIM in a multi-antenna base station for application to downlink multi-user communications, and a realistic power consumption model for SIM-assisted systems is presented. Specifically, we focus on maximizing the energy efficiency (EE) for hybrid precoding design, i.e., the base station digital precoding and SIM wave-based beamforming. Due to the non-convexity and high complexity of the formulated problem, we employ the quadratic transformation method to reformulate the optimization problem and propose an alternating optimization (AO)-based joint precoding framework. Specifically, a successive convex approximation (SCA) algorithm is adopted for the base station precoding design. For the SIM wave-based beamforming, two algorithms are employed: the high-performance semidefinite programming (SDP) method and the low-complexity projected gradient ascent (PGA) algorithm. In particular, the results indicate that while the optimal number of SIM layers for maximizing the EE and spectral efficiency differs, a design of 2 to 5 layers can achieve satisfactory performance for both. Finally, numerical results are illustrated to evaluate the effectiveness of the proposed hybrid precoding framework and to showcase the performance enhancement achieved by the algorithm in comparison to benchmark schemes.
Abstract:In frequency division duplex (FDD) multiple-input multiple-output (MIMO) wireless communication systems, the acquisition of downlink channel state information (CSI) is essential for maximizing spatial resource utilization and improving system spectral efficiency. The separate design of modules in AI-based CSI feedback architectures under traditional modular communication frameworks, including channel estimation (CE), CSI compression and feedback, leads to sub-optimal performance. In this paper, we propose an uplink assisted joint CE and and CSI feedback approach via deep learning for downlink CSI acquisition, which mitigates performance degradation caused by distribution bias across separately trained modules in traditional modular communication frameworks. The proposed network adopts a deep joint source-channel coding (DJSCC) architecture to mitigate the cliff effect encountered in the conventional separate source-channel coding. Furthermore, we exploit the uplink CSI as auxiliary information to enhance CSI reconstruction accuracy by leveraging the partial reciprocity between the uplink and downlink channels in FDD systems, without introducing additional overhead. The effectiveness of uplink CSI as assisted information and the necessity of an end-toend multi-module joint training architecture is validated through comprehensive ablation and scalability experiments.
Abstract:How can robots learn dexterous grasping skills efficiently and apply them adaptively based on user instructions? This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. We introduce AdaDexGrasp, a framework that learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM). To improve sample efficiency, we propose a trajectory following reward that guides reinforcement learning (RL) toward states close to a human demonstration while allowing flexibility in exploration. To learn beyond the single demonstration, we employ curriculum learning, progressively increasing object pose variations to enhance robustness. At deployment, a VLM retrieves the appropriate skill based on user instructions, bridging low-level learned skills with high-level intent. We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations. Finally, we demonstrate zero-shot transfer of our learned policies to a real-world PSYONIC Ability Hand, with a 90% success rate across objects, significantly outperforming the baseline.
Abstract:In recent years, high-speed trains (HSTs) communications have developed rapidly to enhance the stability of train operations and improve passenger connectivity experiences. However, as the train continues to accelerate, urgent technological innovations are needed to overcome challenges such as frequency handover and significant Doppler effects. In this paper, we present a novel architecture featuring movable antennas (MAs) to fully exploit macro spatial diversity, enabling a cell-free (CF) massive multiple-input multiple-output (MIMO) system that supports high-speed train communications. Considering the high likelihood of line-of-sight (LoS) transmission in HST scenario, we derive the uplink spectral efficiency (SE) expression for the movable CF massive MIMO system. Moreover, an optimization problem is formulated to maximize the sum SE of the considered system by optimizing the positions of the antennas. Since the formulated problem is non-convex and highly non-linear, we improve a deep reinforcement learning algorithm to address it by using proximal policy optimization (PPO). Different from traditional optimization approaches, which optimize variables separately and alternately, our improved PPO-based approach optimizes all the variables in unison. Simulation results demonstrate that movable CF massive MIMO effectively suppresses the negative impact of the Doppler effect in HST communications.