Low earth orbit (LEO) satellite communication systems have gained increasing attention as a crucial supplement to terrestrial wireless networks due to their extensive coverage area. This letter presents a novel system design for LEO satellite systems by leveraging stacked intelligent metasurface (SIM) technology. Specifically, the lightweight and energy-efficient SIM is mounted on a satellite to achieve multiuser beamforming directly in the electromagnetic wave domain, which substantially reduces the processing delay and computational load of the satellite compared to the traditional digital beamforming scheme. To overcome the challenges of obtaining instantaneous channel state information (CSI) at the transmitter and maximize the system's performance, a joint power allocation and SIM phase shift optimization problem for maximizing the ergodic sum rate is formulated based on statistical CSI, and an alternating optimization (AO) algorithm is customized to solve it efficiently. Additionally, a user grouping method based on channel correlation and an antenna selection algorithm are proposed to further improve the system performance. Simulation results demonstrate the effectiveness of the proposed SIM-based LEO satellite system design and statistical CSI-based AO algorithm.
Emerging technologies, such as holographic multiple-input multiple-output (HMIMO) and stacked intelligent metasurface (SIM), are driving the development of wireless communication systems. Specifically, the SIM is physically constructed by stacking multiple layers of metasurfaces and has an architecture similar to an artificial neural network (ANN), which can flexibly manipulate the electromagnetic waves that propagate through it at the speed of light. This architecture enables the SIM to achieve HMIMO precoding and combining in the wave domain, thus significantly reducing the hardware cost and energy consumption. In this letter, we investigate the channel estimation problem in SIM-assisted multi-user HMIMO communication systems. Since the number of antennas at the base station (BS) is much smaller than the number of meta-atoms per layer of the SIM, it is challenging to acquire the channel state information (CSI) in SIM-assisted multi-user systems. To address this issue, we collect multiple copies of the uplink pilot signals that propagate through the SIM. Furthermore, we leverage the array geometry to identify the subspace that spans arbitrary spatial correlation matrices. Based on partial CSI about the channel statistics, a pair of subspace-based channel estimators are proposed. Additionally, we compute the mean square error (MSE) of the proposed channel estimators and optimize the phase shifts of the SIM to minimize the MSE. Numerical results are illustrated to analyze the effectiveness of the proposed channel estimation schemes.
This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map, and a paired ancillary controller, augmented to provide robustness against model uncertainties extracted from learning data. To tackle modeling discrepancies, we identify both matched (input discrepancies) and unmatched model residuals between the true and the nominal reduced-order models using closed-loop tracking errors as training data. Utilizing conformal prediction, we extract probabilistic upper bounds for the unknown model residuals, which serve to construct a robustifying ancillary controller. Further, we also determine maximum tracking discrepancies, also known as the robust control invariance tube, under the augmented policy, formulating them as collision buffers. Employing a LiDAR-based occupancy map to characterize the environment, we construct a discrepancy-aware cost map that incorporates these collision buffers. This map is then integrated into a sampling-based model predictive path planner that generates optimal and safe trajectories that can be robustly tracked by the augmented ancillary controller in the presence of model mismatches. The effectiveness of the framework is experimentally validated for autonomous high-speed trajectory tracking in a cluttered environment with four different vehicle-terrain configurations. We also showcase the framework's versatility by reformulating it as a driver-assist program, providing collision avoidance corrections based on user joystick commands.
Stacked intelligent metasurfaces (SIM) represents an advanced signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Its multi-layer structure exhibits customizable increased computational capability compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performance of multi-user multiple-input single-output (MISO) wireless systems with low complexity transmit radio frequency (RF) chains. In particular, an optimization formulation for the joint design of the SIM phase shifts and the transmit power allocation is presented, which is efficiently solved via a customized deep reinforcement learning (DRL) approach that continuously observes pre-designed states of the SIM-parametrized smart wireless environment. The presented performance evaluation results showcase the proposed method's capability to effectively learn from the wireless environment while outperforming conventional precoding schemes under low transmit power conditions. Finally, a whitening process is presented to further augment the robustness of the proposed scheme.
The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility by providing a reliable, efficient, and eco-friendly transportation solution that seamlessly integrates with existing infrastructure and meets the diverse needs of a university setting. To achieve this goal, we delve into the intricacies of the system design, encompassing sensing, perception, localization, planning, and control aspects. We evaluate the autonomous shuttle's performance in real-world scenarios, involving a 1146-kilometer road haul and the transportation of 442 passengers over a two-month period. These experiments demonstrate the effectiveness of our system and offer valuable insights into the intricate process of integrating an autonomous vehicle within campus shuttle operations. Furthermore, a thorough analysis of the lessons derived from this experience furnishes a valuable real-world case study, accompanied by recommendations for future research and development in the field of autonomous driving.
Named Entity Recognition (NER) models play a crucial role in various NLP tasks, including information extraction (IE) and text understanding. In academic writing, references to machine learning models and datasets are fundamental components of various computer science publications and necessitate accurate models for identification. Despite the advancements in NER, existing ground truth datasets do not treat fine-grained types like ML model and model architecture as separate entity types, and consequently, baseline models cannot recognize them as such. In this paper, we release a corpus of 100 manually annotated full-text scientific publications and a first baseline model for 10 entity types centered around ML models and datasets. In order to provide a nuanced understanding of how ML models and datasets are mentioned and utilized, our dataset also contains annotations for informal mentions like "our BERT-based model" or "an image CNN". You can find the ground truth dataset and code to replicate model training at https://data.gesis.org/gsap/gsap-ner.
Deep neural networks have been widely used in communication signal recognition and achieved remarkable performance, but this superiority typically depends on using massive examples for supervised learning, whereas training a deep neural network on small datasets with few labels generally falls into overfitting, resulting in degenerated performance. To this end, we develop a semi-supervised learning (SSL) method that effectively utilizes a large collection of more readily available unlabeled signal data to improve generalization. The proposed method relies largely on a novel implementation of consistency-based regularization, termed Swapped Prediction, which leverages strong data augmentation to perturb an unlabeled sample and then encourage its corresponding model prediction to be close to its original, optimized with a scaled cross-entropy loss with swapped symmetry. Extensive experiments indicate that our proposed method can achieve a promising result for deep SSL of communication signal recognition.
Considerable research efforts have been devoted to the development of motion planning algorithms, which form a cornerstone of the autonomous driving system (ADS). However, obtaining an interactive and secure trajectory for the ADS remains a formidable task, especially in scenarios with significant interaction complexities. Many contemporary prediction-based planning methods frequently overlook interaction modeling, leading to less effective planning performance. This paper introduces a novel prediction-based interactive planning framework that explicitly and mathematically models interactions among traffic entities during the planning process. Our method incorporates interaction reasoning into spatio-temporal (s-t) planning by defining interaction conditions and constraints. Furthermore, it records and continually updates interaction relations for each planned state throughout the forward search. We assess the performance of our approach alongside state-of-the-art methods using a series of experiments conducted in both single and multi-modal scenarios. These experiments encompass variations in the accuracy of prediction outcomes and different degrees of planner aggressiveness. The experimental findings demonstrate the effectiveness and robustness of our method, yielding insights applicable to the wider field of autonomous driving. For the community's reference, our code is accessible at https://github.com/ChenYingbing/IR-STP-Planner.
Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.
Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.