Abstract:Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.
Abstract:Hyperspectral imaging (HSI) has been widely used in agricultural applications for non-destructive estimation of plant nutrient composition and precise determination of nutritional elements in samples. Recently, 3D reconstruction methods have been used to create implicit neural representations of HSI scenes, which can help localize the target object's nutrient composition spatially and spectrally. Neural Radiance Field (NeRF) is a cutting-edge implicit representation that can render hyperspectral channel compositions of each spatial location from any viewing direction. However, it faces limitations in training time and rendering speed. In this paper, we propose Hyperspectral Gaussian Splatting (HS-GS), which combines the state-of-the-art 3D Gaussian Splatting (3DGS) with a diffusion model to enable 3D explicit reconstruction of the hyperspectral scenes and novel view synthesis for the entire spectral range. To enhance the model's ability to capture fine-grained reflectance variations across the light spectrum and leverage correlations between adjacent wavelengths for denoising, we introduce a wavelength encoder to generate wavelength-specific spherical harmonics offsets. We also introduce a novel Kullback--Leibler divergence-based loss to mitigate the spectral distribution gap between the rendered image and the ground truth. A diffusion model is further applied for denoising the rendered images and generating photorealistic hyperspectral images. We present extensive evaluations on five diverse hyperspectral scenes from the Hyper-NeRF dataset to show the effectiveness of our proposed HS-GS framework. The results demonstrate that HS-GS achieves new state-of-the-art performance among all previously published methods. Code will be released upon publication.
Abstract:3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and detailed predictions. Although most existing works utilize a dense grid-based representation, in which the entire 3D space is uniformly divided into discrete voxels, the emergence of 3D Gaussians provides a compact and continuous object-centric representation. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, named as GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy to provide 3D Gaussians with geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism for refining 3D Gaussians with LiDAR-camera fusion features in a lifted 3D space. We conducted extensive experiments on both on-road and off-road datasets, demonstrating that our GaussianFormer3D achieves high prediction accuracy that is comparable to state-of-the-art multi-modal fusion-based methods with reduced memory consumption and improved efficiency.
Abstract:The task of the multi-agent pathfinding (MAPF) problem is to navigate a team of agents from their start point to the goal points. However, this setup is unsuitable in the assembly line scenario, which is periodic with a long working hour. To address this issue, the study formalizes the streaming MAPF (S-MAPF) problem, which assumes that the agents in the same agent stream have a periodic start time and share the same action sequence. The proposed solution, Agent Stream Conflict-Based Search (ASCBS), is designed to tackle this problem by incorporating a cyclic vertex/edge constraint to handle conflicts. Additionally, this work explores the potential usage of the disjoint splitting strategy within ASCBS. Experimental results indicate that ASCBS surpasses traditional MAPF solvers in terms of runtime for scenarios with prolonged working hours.
Abstract:The increase in antenna apertures and transmission frequencies in next-generation wireless networks is catalyzing advancements in near-field communications (NFC). In this paper, we investigate secure transmission in near-field multi-user multiple-input single-output (MU-MISO) scenarios. Specifically, with the advent of extremely large-scale antenna arrays (ELAA) applied in the NFC regime, the spatial degrees of freedom in the channel matrix are significantly enhanced. This creates an expanded null space that can be exploited for designing secure communication schemes. Motivated by this observation, we propose a near-field dynamic hybrid beamforming architecture incorporating artificial noise, which effectively disrupts eavesdroppers at any undesired positions, even in the absence of their channel state information (CSI). Furthermore, we comprehensively analyze the dynamic precoder's performance in terms of the average signal-to-interference-plus-noise ratio, achievable rate, secrecy capacity, secrecy outage probability, and the size of the secrecy zone. In contrast to far-field secure transmission techniques that only enhance security in the angular dimension, the proposed algorithm exploits the unique properties of spherical wave characteristics in NFC to achieve secure transmission in both the angular and distance dimensions. Remarkably, the proposed algorithm is applicable to arbitrary modulation types and array configurations. Numerical results demonstrate that the proposed method achieves approximately 20\% higher rate capacity compared to zero-forcing and the weighted minimum mean squared error precoders.
Abstract:Reconfigurable intelligent surfaces (RIS) can reshape the characteristics of wireless channels by intelligently regulating the phase shifts of reflecting elements. Recently, various codebook schemes have been utilized to optimize the reflection coefficients (RCs); however, the selection of the optimal codeword is usually obtained by evaluating a metric of interest. In this letter, we propose a novel weighted design on the discrete Fourier transform (DFT) codebook to obtain the optimal RCs for RIS-assisted point-to-point multiple-input multiple-output (MIMO) systems. Specifically, we first introduce a channel training protocol where we configure the RIS RCs using the DFT codebook to obtain a set of observations through the uplink training process. Secondly, based on these observed samples, the Lagrange multiplier method is utilized to optimize the weights in an iterative manner, which could result in a higher channel capacity for assisting in the downlink data transmission. Thirdly, we investigate the effect of different codeword configuration orders on system performance and design an efficient codeword configuration method based on statistical channel state information (CSI). Finally, numerical simulations are provided to demonstrate the performance of the proposed scheme.
Abstract:Intelligent surfaces represent a breakthrough technology capable of customizing the wireless channel cost-effectively. However, the existing works generally focus on planar wavefront, neglecting near-field spherical wavefront characteristics caused by large array aperture and high operation frequencies in the terahertz (THz). Additionally, the single-layer reconfigurable intelligent surface (RIS) lacks the signal processing ability to mitigate the computational complexity at the base station (BS). To address this issue, we introduce a novel stacked intelligent metasurfaces (SIM) comprised of an array of programmable metasurface layers. The SIM aims to substitute conventional digital baseband architecture to execute computing tasks with ultra-low processing delay, albeit with a reduced number of radio-frequency (RF) chains and low-resolution digital-to-analog converters. In this paper, we present a SIM-aided multiuser multiple-input single-output (MU-MISO) near-field system, where the SIM is integrated into the BS to perform beamfocusing in the wave domain and customize an end-to-end channel with minimized inter-user interference. Finally, the numerical results demonstrate that near-field communication achieves superior spatial gain over the far-field, and the SIM effectively suppresses inter-user interference as the wireless signals propagate through it.
Abstract:Named entity recognition is an important task when constructing knowledge bases from unstructured data sources. Whereas entity detection methods mostly rely on extensive training data, Large Language Models (LLMs) have paved the way towards approaches that rely on zero-shot learning (ZSL) or few-shot learning (FSL) by taking advantage of the capabilities LLMs acquired during pretraining. Specifically, in very specialized scenarios where large-scale training data is not available, ZSL / FSL opens new opportunities. This paper follows this recent trend and investigates the potential of leveraging Large Language Models (LLMs) in such scenarios to automatically detect datasets and software within textual content from GitHub repositories. While existing methods focused solely on named entities, this study aims to broaden the scope by incorporating resources such as repositories and online hubs where entities are also represented by URLs. The study explores different FSL prompt learning approaches to enhance the LLMs' ability to identify dataset and software mentions within repository texts. Through analyses of LLM effectiveness and learning strategies, this paper offers insights into the potential of advanced language models for automated entity detection.
Abstract:We present a morphological-symmetry-equivariant heterogeneous graph neural network, namely MS-HGNN, for robotic dynamics learning, that integrates robotic kinematic structures and morphological symmetries into a single graph network. These structural priors are embedded into the learning architecture as constraints, ensuring high generalizability, sample and model efficiency. The proposed MS-HGNN is a versatile and general architecture that is applicable to various multi-body dynamic systems and a wide range of dynamics learning problems. We formally prove the morphological-symmetry-equivariant property of our MS-HGNN and validate its effectiveness across multiple quadruped robot learning problems using both real-world and simulated data. Our code is made publicly available at https://github.com/lunarlab-gatech/MorphSym-HGNN/.
Abstract:We present a Morphology-Informed Heterogeneous Graph Neural Network (MI-HGNN) for learning-based contact perception. The architecture and connectivity of the MI-HGNN are constructed from the robot morphology, in which nodes and edges are robot joints and links, respectively. By incorporating the morphology-informed constraints into a neural network, we improve a learning-based approach using model-based knowledge. We apply the proposed MI-HGNN to two contact perception problems, and conduct extensive experiments using both real-world and simulated data collected using two quadruped robots. Our experiments demonstrate the superiority of our method in terms of effectiveness, generalization ability, model efficiency, and sample efficiency. Our MI-HGNN improved the performance of a state-of-the-art model that leverages robot morphological symmetry by 8.4% with only 0.21% of its parameters. Although MI-HGNN is applied to contact perception problems for legged robots in this work, it can be seamlessly applied to other types of multi-body dynamical systems and has the potential to improve other robot learning frameworks. Our code is made publicly available at https://github.com/lunarlab-gatech/Morphology-Informed-HGNN.