



Abstract:3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system. Moreover, the resulting shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can also be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving achieves substantial improvements over state-of-the-art methods. Our code will be available at https://github.com/TRAILab/UncertainShapePose.




Abstract:6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. To mitigate this issue, we present a complete active vision framework for 6D object pose refinement and next-best-view prediction. Specifically, we first develop an optimization-based pose refinement module for the structured light camera. Our system then selects the next best camera viewpoint to collect depth measurements by minimizing the predicted uncertainty of the object pose. Compared to previous approaches, we additionally predict measurement uncertainties of future viewpoints by online rendering, which significantly improves the next-best-view prediction performance. We test our approach on the challenging real-world ROBI dataset. The results demonstrate that our pose refinement method outperforms the traditional ICP-based approach when given the same input depth data, and our next-best-view strategy can achieve high object pose accuracy with significantly fewer viewpoints than the heuristic-based policies.



Abstract:For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such as different sizes of rooms, different reverberation times, and different background noise, may be reasons for a learning-based system to fail. On the other hand, acquiring annotated spatial sound event samples, which include onset and offset time stamps, class types of sound events, and direction-of-arrival (DOA) of sound sources is very expensive. In addition, deploying a SELD system in a new environment often poses challenges due to time-consuming training and fine-tuning processes. To address these issues, we propose Meta-SELD, which applies meta-learning methods to achieve fast adaptation to new environments. More specifically, based on Model Agnostic Meta-Learning (MAML), the proposed Meta-SELD aims to find good meta-initialized parameters to adapt to new environments with only a small number of samples and parameter updating iterations. We can then quickly adapt the meta-trained SELD model to unseen environments. Our experiments compare fine-tuning methods from pre-trained SELD models with our Meta-SELD on the Sony-TAU Realistic Spatial Soundscapes 2023 (STARSSS23) dataset. The evaluation results demonstrate the effectiveness of Meta-SELD when adapting to new environments.
Abstract:The physical design process of large-scale designs is a time-consuming task, often requiring hours to days to complete, with routing being the most critical and complex step. As the the complexity of Integrated Circuits (ICs) increases, there is an increased demand for accurate routing quality prediction. Accurate congestion prediction aids in identifying design flaws early on, thereby accelerating circuit design and conserving resources. Despite the advancements in current congestion prediction methodologies, an essential aspect that has been largely overlooked is the spatial label-correlation between different grids in congestion prediction. The spatial label-correlation is a fundamental characteristic of circuit design, where the congestion status of a grid is not isolated but inherently influenced by the conditions of its neighboring grids. In order to fully exploit the inherent spatial label-correlation between neighboring grids, we propose a novel approach, {\ours}, i.e., VAriational Label-Correlation Enhancement for Congestion Prediction, which considers the local label-correlation in the congestion map, associating the estimated congestion value of each grid with a local label-correlation weight influenced by its surrounding grids. {\ours} leverages variational inference techniques to estimate this weight, thereby enhancing the regression model's performance by incorporating spatial dependencies. Experiment results validate the superior effectiveness of {\ours} on the public available \texttt{ISPD2011} and \texttt{DAC2012} benchmarks using the superblue circuit line.
Abstract:To achieve high-accuracy manipulation in the presence of unknown dynamics and external disturbance, we propose an efficient and robust motion controller (named TvUDE) for robotic manipulators. The controller incorporates a disturbance estimation mechanism that utilizes reformulated robot dynamics and filtering operations to obtain uncertainty and disturbance without requiring measurement of acceleration. Furthermore, we design a time-varying control input gain to enhance the control system's robustness. Finally, we analyze the boundness of the control signal and the stability of the closed-loop system, and conduct a set of experiments on a six-DOF robotic manipulator. The experimental results verify the effectiveness of TvUDE in handling internal uncertainty and external static or transient disturbance.
Abstract:Image inverse halftoning is a classic image restoration task, aiming to recover continuous-tone images from halftone images with only bilevel pixels. Because the halftone images lose much of the original image content, inverse halftoning is a classic ill-problem. Although existing inverse halftoning algorithms achieve good performance, their results lose image details and features. Therefore, it is still a challenge to recover high-quality continuous-tone images. In this paper, we propose an end-to-end multiscale progressively residual learning network (MSPRL), which has a UNet architecture and takes multiscale input images. To make full use of different input image information, we design a shallow feature extraction module to capture similar features between images of different scales. We systematically study the performance of different methods and compare them with our proposed method. In addition, we employ different training strategies to optimize the model, which is important for optimizing the training process and improving performance. Extensive experiments demonstrate that our MSPRL model obtains considerable performance gains in detail restoration.




Abstract:Among the great successes of Reinforcement Learning (RL), self-play algorithms play an essential role in solving competitive games. Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies, making it often stuck in the local optimum and its strategy style simple and homogeneous. A possible solution is to improve the diversity of policies, which helps the agent break the stalemate and enhances its robustness when facing different opponents. However, enhancing diversity in the self-play algorithms is not trivial. In this paper, we aim to introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. Specifically, we design a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning and allows for policy learning with desired risk preferences. Seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives with experiences from playing against diverse opponents. Empirical results show that our method achieves comparable or superior performance in competitive games and that diverse modes of behaviors emerge. Our code is public online at \url{https://github.com/Jackory/RPBT}.
Abstract:Reconfigurable intelligent surface (RIS) is a new technique that is able to manipulate the wireless environment smartly and has been exploited for assisting the wireless communications, especially at high frequency band. However, it suffers from hardware impairments (HWIs) in practical designs, which inevitably degrades its performance and thus limits its full potential. To address this practical issue, we first propose a new RIS reflection model involving phase-shift errors, which is then verified by the measurement results from field trials. With this beamforming model, various phase-shift errors caused by different HWIs can be analyzed. The phase-shift errors are classified into three categories: (1) globally independent and identically distributed errors, (2) grouped independent and identically distributed errors and (3) grouped fixed errors. The impact of typical HWIs, including frequency mismatch, PIN diode failures and panel deformation, on RIS beamforming ability are studied with the theoretical model and are compared with numerical results. The impact of frequency mismatch are discussed separately for narrow-band and wide-band beamforming. Finally, useful insights and guidelines on the RIS design and its deployment are highlighted for practical wireless systems.




Abstract:Equipped with the trained environmental dynamics, model-based offline reinforcement learning (RL) algorithms can often successfully learn good policies from fixed-sized datasets, even some datasets with poor quality. Unfortunately, however, it can not be guaranteed that the generated samples from the trained dynamics model are reliable (e.g., some synthetic samples may lie outside of the support region of the static dataset). To address this issue, we propose Trajectory Truncation with Uncertainty (TATU), which adaptively truncates the synthetic trajectory if the accumulated uncertainty along the trajectory is too large. We theoretically show the performance bound of TATU to justify its benefits. To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO. Furthermore, we integrate TATU with several off-the-shelf model-free offline RL algorithms, e.g., BCQ. Experimental results on the D4RL benchmark show that TATU significantly improves their performance, often by a large margin.




Abstract:The Outstanding performance and growing size of Large Language Models has led to increased attention in parameter efficient learning. The two predominant approaches are Adapters and Pruning. Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption. Pruning is to cut off some weight and re-distribute the remaining weight, which sacrifices the complexity of training at the cost of extremely high memory and training time, making the cost of evaluation and testing relatively low. So efficiency of training and inference can't be obtained in the same time. In this work, we propose a task-oriented Pruning-Adapter method that achieve a high memory efficiency of training and memory, and speeds up training time and ensures no significant decrease in accuracy in GLUE tasks, achieving training and inference efficiency at the same time.