Alert button
Picture for Xuelong Li

Xuelong Li

Alert button

Affordance-Driven Next-Best-View Planning for Robotic Grasping

Sep 18, 2023
Xuechao Zhang, Dong Wang, Sun Han, Weichuang Li, Bin Zhao, Zhigang Wang, Xiaoming Duan, Chongrong Fang, Xuelong Li, Jianping He

Grasping occluded objects in cluttered environments is an essential component in complex robotic manipulation tasks. In this paper, we introduce an AffordanCE-driven Next-Best-View planning policy (ACE-NBV) that tries to find a feasible grasp for target object via continuously observing scenes from new viewpoints. This policy is motivated by the observation that the grasp affordances of an occluded object can be better-measured under the view when the view-direction are the same as the grasp view. Specifically, our method leverages the paradigm of novel view imagery to predict the grasps affordances under previously unobserved view, and select next observation view based on the gain of the highest imagined grasp quality of the target object. The experimental results in simulation and on the real robot demonstrate the effectiveness of the proposed affordance-driven next-best-view planning policy. Additional results, code, and videos of real robot experiments can be found in the supplementary materials.

* Conference on Robot Learning (CoRL) 2023 
Viaarxiv icon

Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

Sep 01, 2023
Jiyuan Shi, Chenjia Bai, Haoran He, Lei Han, Dong Wang, Bin Zhao, Mingguo Zhao, Xiu Li, Xuelong Li

Figure 1 for Robust Quadrupedal Locomotion via Risk-Averse Policy Learning
Figure 2 for Robust Quadrupedal Locomotion via Risk-Averse Policy Learning
Figure 3 for Robust Quadrupedal Locomotion via Risk-Averse Policy Learning
Figure 4 for Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at https://risk-averse-locomotion.github.io/.

* 8 pages, 5 figures 
Viaarxiv icon

QKSAN: A Quantum Kernel Self-Attention Network

Aug 25, 2023
Ren-Xin Zhao, Jinjing Shi, Xuelong Li

Figure 1 for QKSAN: A Quantum Kernel Self-Attention Network
Figure 2 for QKSAN: A Quantum Kernel Self-Attention Network
Figure 3 for QKSAN: A Quantum Kernel Self-Attention Network
Figure 4 for QKSAN: A Quantum Kernel Self-Attention Network

Self-Attention Mechanism (SAM) is skilled at extracting important information from the interior of data to improve the computational efficiency of models. Nevertheless, many Quantum Machine Learning (QML) models lack the ability to distinguish the intrinsic connections of information like SAM, which limits their effectiveness on massive high-dimensional quantum data. To address this issue, a Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced, which combines the data representation benefit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. A Quantum Kernel Self-Attention Network (QKSAN) framework is built based on QKSAM, with Deferred Measurement Principle (DMP) and conditional measurement techniques, which releases half of the quantum resources with probabilistic measurements during computation. The Quantum Kernel Self-Attention Score (QKSAS) determines the measurement conditions and reflects the probabilistic nature of quantum systems. Finally, four QKSAN models are deployed on the Pennylane platform to perform binary classification on MNIST images. The best-performing among the four models is assessed for noise immunity and learning ability. Remarkably, the potential learning benefit of partial QKSAN models over classical deep learning is that they require few parameters for a high return of 98\% $\pm$ 1\% test and train accuracy, even with highly compressed images. QKSAN lays the foundation for future quantum computers to perform machine learning on massive amounts of data, while driving advances in areas such as quantum Natural Language Processing (NLP).

Viaarxiv icon

Disentangled Contrastive Image Translation for Nighttime Surveillance

Jul 11, 2023
Guanzhou Lan, Bin Zhao, Xuelong Li

Nighttime surveillance suffers from degradation due to poor illumination and arduous human annotations. It is challengable and remains a security risk at night. Existing methods rely on multi-spectral images to perceive objects in the dark, which are troubled by low resolution and color absence. We argue that the ultimate solution for nighttime surveillance is night-to-day translation, or Night2Day, which aims to translate a surveillance scene from nighttime to the daytime while maintaining semantic consistency. To achieve this, this paper presents a Disentangled Contrastive (DiCo) learning method. Specifically, to address the poor and complex illumination in the nighttime scenes, we propose a learnable physical prior, i.e., the color invariant, which provides a stable perception of a highly dynamic night environment and can be incorporated into the learning pipeline of neural networks. Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning. Such a strategy can extract the semantics without supervision and boost our model to achieve instance-aware translation. Finally, we incorporate all the modules above into generative adversarial networks and achieve high-fidelity translation. This paper also contributes a new surveillance dataset called NightSuR. It includes six scenes to support the study on nighttime surveillance. This dataset collects nighttime images with different properties of nighttime environments, such as flare and extreme darkness. Extensive experiments demonstrate that our method outperforms existing works significantly. The dataset and source code will be released on GitHub soon.

* Submitted to TIP 
Viaarxiv icon

Sequential Attention Source Identification Based on Feature Representation

Jun 28, 2023
Dongpeng Hou, Zhen Wang, Chao Gao, Xuelong Li

Figure 1 for Sequential Attention Source Identification Based on Feature Representation
Figure 2 for Sequential Attention Source Identification Based on Feature Representation
Figure 3 for Sequential Attention Source Identification Based on Feature Representation
Figure 4 for Sequential Attention Source Identification Based on Feature Representation

Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on generating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It's worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance and scalability in different scenarios of TGASI.

* Proceedings of the 32nd International Joint Conference on Artificial Intelligence, 2023  
Viaarxiv icon

Hierarchical Matching and Reasoning for Multi-Query Image Retrieval

Jun 26, 2023
Zhong Ji, Zhihao Li, Yan Zhang, Haoran Wang, Yanwei Pang, Xuelong Li

Figure 1 for Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Figure 2 for Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Figure 3 for Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Figure 4 for Hierarchical Matching and Reasoning for Multi-Query Image Retrieval

As a promising field, Multi-Query Image Retrieval (MQIR) aims at searching for the semantically relevant image given multiple region-specific text queries. Existing works mainly focus on a single-level similarity between image regions and text queries, which neglects the hierarchical guidance of multi-level similarities and results in incomplete alignments. Besides, the high-level semantic correlations that intrinsically connect different region-query pairs are rarely considered. To address above limitations, we propose a novel Hierarchical Matching and Reasoning Network (HMRN) for MQIR. It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations. HMRN comprises two modules: Scalar-based Matching (SM) module and Vector-based Reasoning (VR) module. Specifically, the SM module characterizes the multi-level alignment similarity, which consists of a fine-grained local-level similarity and a context-aware global-level similarity. Afterwards, the VR module is developed to excavate the potential semantic correlations among multiple region-query pairs, which further explores the high-level reasoning similarity. Finally, these three-level similarities are aggregated into a joint similarity space to form the ultimate similarity. Extensive experiments on the benchmark dataset demonstrate that our HMRN substantially surpasses the current state-of-the-art methods. For instance, compared with the existing best method Drill-down, the metric R@1 in the last round is improved by 23.4%. Our source codes will be released at https://github.com/LZH-053/HMRN.

Viaarxiv icon

Variational Positive-incentive Noise: How Noise Benefits Models

Jun 13, 2023
Hongyuan Zhang, Sida Huang, Xuelong Li

Figure 1 for Variational Positive-incentive Noise: How Noise Benefits Models
Figure 2 for Variational Positive-incentive Noise: How Noise Benefits Models
Figure 3 for Variational Positive-incentive Noise: How Noise Benefits Models
Figure 4 for Variational Positive-incentive Noise: How Noise Benefits Models

A large number of works aim to alleviate the impact of noise due to an underlying conventional assumption of the negative role of noise. However, some existing works show that the assumption does not always hold. In this paper, we investigate how to benefit the classical models by random noise under the framework of Positive-incentive Noise (Pi-Noise). Since the ideal objective of Pi-Noise is intractable, we propose to optimize its variational bound instead, namely variational Pi-Noise (VPN). With the variational inference, a VPN generator implemented by neural networks is designed for enhancing base models and simplifying the inference of base models, without changing the architecture of base models. Benefiting from the independent design of base models and VPN generators, the VPN generator can work with most existing models. From the experiments, it is shown that the proposed VPN generator can improve the base models. It is appealing that the trained variational VPN generator prefers to blur the irrelevant ingredients in complicated images, which meets our expectations.

Viaarxiv icon

Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks

Jun 05, 2023
Xiaohan Liu, Yanwei Pang, Xuebin Sun, Yiming Liu, Yonghong Hou, Zhenchang Wang, Xuelong Li

Figure 1 for Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks
Figure 2 for Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks
Figure 3 for Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks
Figure 4 for Image Reconstruction for Accelerated MR Scan with Faster Fourier Convolutional Neural Networks

Partial scan is a common approach to accelerate Magnetic Resonance Imaging (MRI) data acquisition in both 2D and 3D settings. However, accurately reconstructing images from partial scan data (i.e., incomplete k-space matrices) remains challenging due to lack of an effectively global receptive field in both spatial and k-space domains. To address this problem, we propose the following: (1) a novel convolutional operator called Faster Fourier Convolution (FasterFC) to replace the two consecutive convolution operations typically used in convolutional neural networks (e.g., U-Net, ResNet). Based on the spectral convolution theorem in Fourier theory, FasterFC employs alternating kernels of size 1 in 3D case) in different domains to extend the dual-domain receptive field to the global and achieves faster calculation speed than traditional Fast Fourier Convolution (FFC). (2) A 2D accelerated MRI method, FasterFC-End-to-End-VarNet, which uses FasterFC to improve the sensitivity maps and reconstruction quality. (3) A multi-stage 3D accelerated MRI method called FasterFC-based Single-to-group Network (FAS-Net) that utilizes a single-to-group algorithm to guide k-space domain reconstruction, followed by FasterFC-based cascaded convolutional neural networks to expand the effective receptive field in the dual-domain. Experimental results on the fastMRI and Stanford MRI Data datasets demonstrate that FasterFC improves the quality of both 2D and 3D reconstruction. Moreover, FAS-Net, as a 3D high-resolution multi-coil (eight) accelerated MRI method, achieves superior reconstruction performance in both qualitative and quantitative results compared with state-of-the-art 2D and 3D methods.

Viaarxiv icon

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

May 29, 2023
Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li

Figure 1 for Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Figure 2 for Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Figure 3 for Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Figure 4 for Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

Diffusion models have demonstrated highly-expressive generative capabilities in vision and NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are also powerful in modeling complex policies or trajectories in offline datasets. However, these works have been limited to single-task settings where a generalist agent capable of addressing multi-task predicaments is absent. In this paper, we aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution. Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings. \textsc{MTDiff} leverages vast amounts of knowledge available in multi-task data and performs implicit knowledge sharing among tasks. For generative planning, we find \textsc{MTDiff} outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D. For data synthesis, \textsc{MTDiff} generates high-quality data for testing tasks given a single demonstration as a prompt, which enhances the low-quality datasets for even unseen tasks.

* 21 pages 
Viaarxiv icon