Alert button
Picture for Qiang Zhang

Qiang Zhang

Alert button

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

Sep 20, 2023
Jingkai Sun, Qiang Zhang, Yiqun Duan, Xiaoyang Jiang, Chong Cheng, Renjing Xu

In recent years, reinforcement learning and imitation learning have shown great potential for controlling humanoid robots' motion. However, these methods typically create simulation environments and rewards for specific tasks, resulting in the requirements of multiple policies and limited capabilities for tackling complex and unknown tasks. To overcome these issues, we present a novel approach that combines adversarial imitation learning with large language models (LLMs). This innovative method enables the agent to learn reusable skills with a single policy and solve zero-shot tasks under the guidance of LLMs. In particular, we utilize the LLM as a strategic planner for applying previously learned skills to novel tasks through the comprehension of task-specific prompts. This empowers the robot to perform the specified actions in a sequence. To improve our model, we incorporate codebook-based vector quantization, allowing the agent to generate suitable actions in response to unseen textual commands from LLMs. Furthermore, we design general reward functions that consider the distinct motion features of humanoid robots, ensuring the agent imitates the motion data while maintaining goal orientation without additional guiding direction approaches or policies. To the best of our knowledge, this is the first framework that controls humanoid robots using a single learning policy network and LLM as a planner. Extensive experiments demonstrate that our method exhibits efficient and adaptive ability in complicated motion tasks.

Viaarxiv icon

A Delay Compensation Framework Based on Eye-Movement for Teleoperated Ground Vehicles

Sep 14, 2023
Qiang Zhang, Lingfang Yang, Zhi Huang, Xiaolin Song

An eye-movement-based predicted trajectory guidance control (ePTGC) is proposed to mitigate the maneuverability degradation of a teleoperated ground vehicle caused by communication delays. Human sensitivity to delays is the main reason for the performance degradation of a ground vehicle teleoperation system. The proposed framework extracts human intention from eye-movement. Then, it combines it with contextual constraints to generate an intention-compliant guidance trajectory, which is then employed to control the vehicle directly. The advantage of this approach is that the teleoperator is removed from the direct control loop by using the generated trajectories to guide vehicle, thus reducing the adverse sensitivity to delay. The delay can be compensated as long as the prediction horizon exceeds the delay. A human-in-loop simulation platform is designed to evaluate the teleoperation performance of the proposed method at different delay levels. The results are analyzed by repeated measures ANOVA, which shows that the proposed method significantly improves maneuverability and cognitive burden at large delay levels (>200 ms). The overall performance is also much better than the PTGC which does not employ the eye-movement feature.

* 9 pages, 11 figures 
Viaarxiv icon

Spiking Denoising Diffusion Probabilistic Models

Jun 29, 2023
Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, Renjing Xu

Figure 1 for Spiking Denoising Diffusion Probabilistic Models
Figure 2 for Spiking Denoising Diffusion Probabilistic Models
Figure 3 for Spiking Denoising Diffusion Probabilistic Models
Figure 4 for Spiking Denoising Diffusion Probabilistic Models

Spiking neural networks (SNNs) have ultra-low energy consumption and high biological plausibility due to their binary and bio-driven nature compared with artificial neural networks (ANNs). While previous research has primarily focused on enhancing the performance of SNNs in classification tasks, the generative potential of SNNs remains relatively unexplored. In our paper, we put forward Spiking Denoising Diffusion Probabilistic Models (SDDPM), a new class of SNN-based generative models that achieve high sample quality. To fully exploit the energy efficiency of SNNs, we propose a purely Spiking U-Net architecture, which achieves comparable performance to its ANN counterpart using only 4 time steps, resulting in significantly reduced energy consumption. Extensive experimental results reveal that our approach achieves state-of-the-art on the generative tasks and substantially outperforms other SNN-based generative models, achieving up to $12\times$ and $6\times$ improvement on the CIFAR-10 and the CelebA datasets, respectively. Moreover, we propose a threshold-guided strategy that can further improve the performances by 16.7% in a training-free manner. The SDDPM symbolizes a significant advancement in the field of SNN generation, injecting new perspectives and potential avenues of exploration.

* Under Review 
Viaarxiv icon

Graph Sampling-based Meta-Learning for Molecular Property Prediction

Jun 29, 2023
Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, Huajun Chen

Figure 1 for Graph Sampling-based Meta-Learning for Molecular Property Prediction
Figure 2 for Graph Sampling-based Meta-Learning for Molecular Property Prediction
Figure 3 for Graph Sampling-based Meta-Learning for Molecular Property Prediction
Figure 4 for Graph Sampling-based Meta-Learning for Molecular Property Prediction

Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.

* Accepted by IJCAI 2023 
Viaarxiv icon

Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection

Jun 28, 2023
Jiawei Liu, Jingyi Xie, Fanrui Zhang, Qiang Zhang, Zheng-jun Zha

Figure 1 for Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection
Figure 2 for Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection
Figure 3 for Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection
Figure 4 for Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection

The explosive growth of rumors with text and images on social media platforms has drawn great attention. Existing studies have made significant contributions to cross-modal information interaction and fusion, but they fail to fully explore hierarchical and complex semantic correlation across different modality content, severely limiting their performance on detecting multi-modal rumor. In this work, we propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection by jointly modeling the basic semantic correlation and high-order knowledge-enhanced entity correlation. Specifically, KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space and captures the basic cross-modal semantic consistency and inconsistency by a cross-modal fusion layer. Moreover, considering the description of multi-modal content is narrated around entities, KhiCL extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy to find the shortest semantic relevant path between each pair of entities in external knowledge graph, and absorbs all complementary contextual knowledge of other connected entities in this path for learning knowledge-enhanced entity representations. Furthermore, KhiCL utilizes a signed attention mechanism to model the knowledge-enhanced entity consistency and inconsistency of intra-modality and inter-modality entity pairs by measuring their corresponding semantic relevant distance. Extensive experiments have demonstrated the effectiveness of the proposed method.

Viaarxiv icon

Hand Pose Estimation with Mems-Ultrasonic Sensors

Jun 22, 2023
Qiang Zhang, Yuanqiao Lin, Yubin Lin, Szymon Rusinkiewicz

Figure 1 for Hand Pose Estimation with Mems-Ultrasonic Sensors
Figure 2 for Hand Pose Estimation with Mems-Ultrasonic Sensors
Figure 3 for Hand Pose Estimation with Mems-Ultrasonic Sensors
Figure 4 for Hand Pose Estimation with Mems-Ultrasonic Sensors

Hand tracking is an important aspect of human-computer interaction and has a wide range of applications in extended reality devices. However, current hand motion capture methods suffer from various limitations. For instance, visual-based hand pose estimation is susceptible to self-occlusion and changes in lighting conditions, while IMU-based tracking gloves experience significant drift and are not resistant to external magnetic field interference. To address these issues, we propose a novel and low-cost hand-tracking glove that utilizes several MEMS-ultrasonic sensors attached to the fingers, to measure the distance matrix among the sensors. Our lightweight deep network then reconstructs the hand pose from the distance matrix. Our experimental results demonstrate that this approach is both accurate, size-agnostic, and robust to external interference. We also show the design logic for the sensor selection, sensor configurations, circuit diagram, as well as model architecture.

Viaarxiv icon

Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models

May 29, 2023
Qiang Zhang, Jason Naradowsky, Yusuke Miyao

Figure 1 for Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Figure 2 for Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Figure 3 for Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Figure 4 for Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models

Existing dialogue models may encounter scenarios which are not well-represented in the training data, and as a result generate responses that are unnatural, inappropriate, or unhelpful. We propose the "Ask an Expert" framework in which the model is trained with access to an "expert" which it can consult at each turn. Advice is solicited via a structured dialogue with the expert, and the model is optimized to selectively utilize (or ignore) it given the context and dialogue history. In this work the expert takes the form of an LLM. We evaluate this framework in a mental health support domain, where the structure of the expert conversation is outlined by pre-specified prompts which reflect a reasoning strategy taught to practitioners in the field. Blenderbot models utilizing "Ask an Expert" show quality improvements across all expert sizes, including those with fewer parameters than the dialogue model itself. Our best model provides a $\sim 10\%$ improvement over baselines, approaching human-level scores on "engingingness" and "helpfulness" metrics.

* Accepted in Findings of the Association for Computational Linguistics: ACL 2023 
Viaarxiv icon

A Survey on Asking Clarification Questions Datasets in Conversational Systems

May 25, 2023
Hossein A. Rahmani, Xi Wang, Yue Feng, Qiang Zhang, Emine Yilmaz, Aldo Lipani

Figure 1 for A Survey on Asking Clarification Questions Datasets in Conversational Systems
Figure 2 for A Survey on Asking Clarification Questions Datasets in Conversational Systems
Figure 3 for A Survey on Asking Clarification Questions Datasets in Conversational Systems
Figure 4 for A Survey on Asking Clarification Questions Datasets in Conversational Systems

The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems.

* ACL 2023, 17 pages 
Viaarxiv icon

Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

May 24, 2023
Lingbing Guo, Weiqing Wang, Zhuo Chen, Ningyu Zhang, Zequn Sun, Yixuan Lai, Qiang Zhang, Huajun Chen

Figure 1 for Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems
Figure 2 for Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems
Figure 3 for Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems
Figure 4 for Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies. With the initial state of a system as input, the recent graph neural networks (GNNs)-based methods are capable of predicting the future state distant in time with high accuracy. Although these methods have diverse designs in modeling the coordinates and interacting forces of the system, we show that they actually share a common paradigm that learns the integration of the velocity over the interval between the initial and terminal coordinates. However, their integrand is constant w.r.t. time. Inspired by this observation, we propose a new approach to predict the integration based on several velocity estimations with Newton-Cotes formulas and prove its effectiveness theoretically. Extensive experiments on several benchmarks empirically demonstrate consistent and significant improvement compared with the state-of-the-art methods.

* Under review 
Viaarxiv icon

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

May 11, 2023
Dongyang Li, Ruixue Ding, Qiang Zhang, Zheng Li, Boli Chen, Pengjun Xie, Yao Xu, Xin Li, Ning Guo, Fei Huang, Xiaofeng He

Figure 1 for GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Figure 2 for GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Figure 3 for GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Figure 4 for GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information. However, few researchers focus on geographic natural language processing, and there has never been a benchmark to build a unified standard. In this work, we propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE. We collect data from open-released geographic resources and introduce six natural language understanding tasks, including geographic textual similarity on recall, geographic textual similarity on rerank, geographic elements tagging, geographic composition analysis, geographic where what cut, and geographic entity alignment. We also pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.

Viaarxiv icon