This paper explores an energy-modified leverage sampling strategy for matrix completion in radio map construction. The main goal is to address potential identifiability issues in matrix completion with sparse observations by using a probabilistic sampling approach. Although conventional leverage sampling is commonly employed for designing sampling patterns, it often assigns high sampling probability to locations with low received signal strength (RSS) values, leading to a low sampling efficiency. Theoretical analysis demonstrates that the leverage score produces pseudo images of sources, and in the regions around the source locations, the leverage probability is asymptotically consistent with the RSS. Based on this finding, an energy-modified leverage probability-based sampling strategy is investigated for efficient sampling. Numerical demonstrations indicate that the proposed sampling strategy can decrease the normalized mean squared error (NMSE) of radio map construction by more than 10% for both matrix completion and interpolation-assisted matrix completion schemes, compared to conventional methods.
Machine learning (ML) facilitates rapid channel modeling for 5G and beyond wireless communication systems. Many existing ML techniques utilize a city map to construct the radio map; however, an updated city map may not always be available. This paper proposes to employ the received signal strength (RSS) data to jointly construct the radio map and the virtual environment by exploiting the geometry structure of the environment. In contrast to many existing ML approaches that lack of an environment model, we develop a virtual obstacle model and characterize the geometry relation between the propagation paths and the virtual obstacles. A multi-screen knife-edge model is adopted to extract the key diffraction features, and these features are fed into a neural network (NN) for diffraction representation. To describe the scattering, as oppose to most existing methods that directly input an entire city map, our model focuses on the geometry structure from the local area surrounding the TX-RX pair and the spatial invariance of such local geometry structure is exploited. Numerical experiments demonstrate that, in addition to reconstructing a 3D virtual environment, the proposed model outperforms the state-of-the-art methods in radio map construction with 10%-18% accuracy improvements. It can also reduce 20% data and 50% training epochs when transferred to a new environment.
This paper addresses the challenge of reconstructing a 3D power spectrum map from sparse, scattered, and incomplete spectrum measurements. It proposes an integrated approach combining interpolation and block-term tensor decomposition (BTD). This approach leverages an interpolation model with the BTD structure to exploit the spatial correlation of power spectrum maps. Additionally, nuclear norm regularization is incorporated to effectively capture the low-rank characteristics. To implement this approach, a novel algorithm that combines alternating regression with singular value thresholding is developed. Analytical justification for the enhancement provided by the BTD structure in interpolating power spectrum maps is provided, yielding several important theoretical insights. The analysis explores the impact of the spectrum on the error in the proposed method and compares it to conventional local polynomial interpolation. Extensive numerical results demonstrate that the proposed method outperforms state-of-the-art methods in terms of signal source separation and power spectrum map construction, and remains stable under off-grid measurements and inhomogeneous measurement topologies.
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various scenarios. In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints, and applies code generation to introduce generalization ability across various robotics platforms. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning. Extensive experiments demonstrate that RoboCodeX achieves state-of-the-art performance in both simulators and real robots on four different kinds of manipulation tasks and one navigation task.
Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental components of autonomous robot systems including robot perception, motion planning, and control. To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language. The RobotScript platform addresses this gap by emphasizing the unified interface with both simulation and real robots, based on abstraction from the Robot Operating System (ROS), ensuring syntax compliance and simulation validation with Gazebo. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms, and multiple grippers. Additionally, our benchmark assesses reasoning abilities for physical space and constraints, highlighting the differences between GPT-3.5, GPT-4, and Gemini in handling complex physical interactions. Finally, we present a thorough evaluation on the whole system, exploring how each module in the pipeline: code generation, perception, motion planning, and even object geometric properties, impact the overall performance of the system.
In the 6G era, real-time radio resource monitoring and management are urged to support diverse wireless-empowered applications. This calls for fast and accurate estimation on the distribution of the radio resources, which is usually represented by the spatial signal power strength over the geographical environment, known as a radio map. In this paper, we present a cooperative radio map estimation (CRME) approach enabled by the generative adversarial network (GAN), called as GAN-CRME, which features fast and accurate radio map estimation without the transmitters' information. The radio map is inferred by exploiting the interaction between distributed received signal strength (RSS) measurements at mobile users and the geographical map using a deep neural network estimator, resulting in low data-acquisition cost and computational complexity. Moreover, a GAN-based learning algorithm is proposed to boost the inference capability of the deep neural network estimator by exploiting the power of generative AI. Simulation results showcase that the proposed GAN-CRME is even capable of coarse error-correction when the geographical map information is inaccurate.
Herein, an interference-aware predictive aerial-and-terrestrial communication problem is studied, where an unmanned aerial vehicle (UAV) delivers some data payload to a few nodes within a communication deadline. The first challenge is the possible interference to the ground base stations (BSs) and users possibly at unknown locations. This paper develops a radio-map-based approach to predict the channel to the receivers and the unintended nodes. Therefore, a predictive communication strategy can be optimized ahead of time to reduce the interference power and duration for the ground nodes. Such predictive optimization raises the second challenge of developing a low-complexity solution for a batch of transmission strategies over T time slots for N receivers before the flight. Mathematically, while the proposed interference-aware predictive communication problem is non-convex, it is converted into a relaxed convex problem, and solved by a novel dual-based algorithm, which is shown to achieve global optimality at asymptotically small slot duration. The proposed algorithm demonstrates orders of magnitude saving of the computational time for moderate T and N compared to several existing solvers. Simulations show that the radio-map-assisted scheme can prevent all unintended receivers with known positions from experiencing interference and significantly reduce the interference to the users at unknown locations.
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large-dimensional channels, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links becomes quite challenging in 6G. On the other hand, there would be numerous data sources in 6G containing high-quality location-tagged channel data, making it possible to better learn the local wireless environment. By exploiting such new opportunities and for tackling the CSI acquisition challenge, there is a promising paradigm shift from the conventional environment-unaware communications to the new environment-aware communications based on the novel approach of channel knowledge map (CKM). This article aims to provide a comprehensive tutorial overview on environment-aware communications enabled by CKM to fully harness its benefits for 6G. First, the basic concept of CKM is presented, and a comparison of CKM with various existing channel inference techniques is discussed. Next, the main techniques for CKM construction are discussed, including both the model-free and model-assisted approaches. Furthermore, a general framework is presented for the utilization of CKM to achieve environment-aware communications, followed by some typical CKM-aided communication scenarios. Finally, important open problems in CKM research are highlighted and potential solutions are discussed to inspire future work.
Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.
Object goal navigation is an important problem in Embodied AI that involves guiding the agent to navigate to an instance of the object category in an unknown environment -- typically an indoor scene. Unfortunately, current state-of-the-art methods for this problem rely heavily on data-driven approaches, \eg, end-to-end reinforcement learning, imitation learning, and others. Moreover, such methods are typically costly to train and difficult to debug, leading to a lack of transferability and explainability. Inspired by recent successes in combining classical and learning methods, we present a modular and training-free solution, which embraces more classic approaches, to tackle the object goal navigation problem. Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework. We then inject semantics into geometric-based frontier exploration to reason about promising areas to search for a goal object. Our structured scene representation comprises a 2D occupancy map, semantic point cloud, and spatial scene graph. Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers. With injected semantic priors, the agent can reason about the most promising frontier to explore. The proposed pipeline shows strong experimental performance for object goal navigation on the Gibson benchmark dataset, outperforming the previous state-of-the-art. We also perform comprehensive ablation studies to identify the current bottleneck in the object navigation task.