Alert button
Picture for Peng Liu

Peng Liu

Alert button

NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian

Dec 03, 2023
Peng Liu, Lemei Zhang, Terje Nissen Farup, Even W. Lauvrak, Jon Espen Ingvaldsen, Simen Eide, Jon Atle Gulla, Zhirong Yang

Recent advancements in Generative Language Models (GLMs) have transformed Natural Language Processing (NLP) by showcasing the effectiveness of the "pre-train, prompt, and predict" paradigm in utilizing pre-trained GLM knowledge for diverse applications. Despite their potential, these capabilities lack adequate quantitative characterization due to the absence of comprehensive benchmarks, particularly for low-resource languages. Existing low-resource benchmarks focus on discriminative language models like BERT, neglecting the evaluation of generative language models. Moreover, current benchmarks often overlook measuring generalization performance across multiple tasks, a crucial metric for GLMs. To bridge these gaps, we introduce NLEBench, a comprehensive benchmark tailored for evaluating natural language generation capabilities in Norwegian, a low-resource language. We use Norwegian as a case study to explore whether current GLMs and benchmarks in mainstream languages like English can reveal the unique characteristics of underrepresented languages. NLEBench encompasses a suite of real-world NLP tasks ranging from news storytelling, summarization, open-domain conversation, natural language understanding, instruction fine-tuning, toxicity and bias evaluation, to self-curated Chain-of-Thought investigation. It features two high-quality, human-annotated datasets: an instruction dataset covering traditional Norwegian cultures, idioms, slang, and special expressions, and a document-grounded multi-label dataset for topic classification, question answering, and summarization. This paper also introduces foundational Norwegian Generative Language Models (NorGLMs) developed with diverse parameter scales and Transformer-based architectures. Systematic evaluations on the proposed benchmark suite provide insights into the capabilities and scalability of NorGLMs across various downstream tasks.

Viaarxiv icon

Robust Visual Imitation Learning with Inverse Dynamics Representations

Oct 22, 2023
Siyuan Li, Xun Wang, Rongchang Zuo, Kewu Sun, Lingfei Cui, Jishiyu Ding, Peng Liu, Zhe Ma

Figure 1 for Robust Visual Imitation Learning with Inverse Dynamics Representations
Figure 2 for Robust Visual Imitation Learning with Inverse Dynamics Representations
Figure 3 for Robust Visual Imitation Learning with Inverse Dynamics Representations
Figure 4 for Robust Visual Imitation Learning with Inverse Dynamics Representations

Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for challenging problems with high-dimensional image observations. However, in real-world scenarios, it is rare to have the chance to collect expert trajectories precisely in the target learning environment. To address this challenge, we propose a novel robust imitation learning approach, where we develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data not only element-wise, but also from the trajectory level. We conduct extensive experiments to evaluate the proposed approach under various visual perturbations and in diverse visual control tasks. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.

Viaarxiv icon

GPS Attack Detection and Mitigation for Safe Autonomous Driving using Image and Map based Lateral Direction Localization

Oct 09, 2023
Qingming Chen, Peng Liu, Guoqiang Li, Zhenpo Wang

Figure 1 for GPS Attack Detection and Mitigation for Safe Autonomous Driving using Image and Map based Lateral Direction Localization
Figure 2 for GPS Attack Detection and Mitigation for Safe Autonomous Driving using Image and Map based Lateral Direction Localization
Figure 3 for GPS Attack Detection and Mitigation for Safe Autonomous Driving using Image and Map based Lateral Direction Localization
Figure 4 for GPS Attack Detection and Mitigation for Safe Autonomous Driving using Image and Map based Lateral Direction Localization

The accuracy and robustness of vehicle localization are critical for achieving safe and reliable high-level autonomy. Recent results show that GPS is vulnerable to spoofing attacks, which is one major threat to autonomous driving. In this paper, a novel anomaly detection and mitigation method against GPS attacks that utilizes onboard camera and high-precision maps is proposed to ensure accurate vehicle localization. First, lateral direction localization in driving lanes is calculated by camera-based lane detection and map matching respectively. Then, a real-time detector for GPS spoofing attack is developed to evaluate the localization data. When the attack is detected, a multi-source fusion-based localization method using Unscented Kalman filter is derived to mitigate GPS attack and improve the localization accuracy. The proposed method is validated in various scenarios in Carla simulator and open-source public dataset to demonstrate its effectiveness in timely GPS attack detection and data recovery.

Viaarxiv icon

Exploring Small Language Models with Prompt-Learning Paradigm for Efficient Domain-Specific Text Classification

Sep 26, 2023
Hengyu Luo, Peng Liu, Stefan Esping

Figure 1 for Exploring Small Language Models with Prompt-Learning Paradigm for Efficient Domain-Specific Text Classification
Figure 2 for Exploring Small Language Models with Prompt-Learning Paradigm for Efficient Domain-Specific Text Classification
Figure 3 for Exploring Small Language Models with Prompt-Learning Paradigm for Efficient Domain-Specific Text Classification
Figure 4 for Exploring Small Language Models with Prompt-Learning Paradigm for Efficient Domain-Specific Text Classification

Domain-specific text classification faces the challenge of scarce labeled data due to the high cost of manual labeling. Prompt-learning, known for its efficiency in few-shot scenarios, is proposed as an alternative to traditional fine-tuning methods. And besides, although large language models (LLMs) have gained prominence, small language models (SLMs, with under 1B parameters) offer significant customizability, adaptability, and cost-effectiveness for domain-specific tasks, given industry constraints. In this study, we investigate the potential of SLMs combined with prompt-learning paradigm for domain-specific text classification, specifically within customer-agent interactions in retail. Our evaluations show that, in few-shot settings when prompt-based model fine-tuning is possible, T5-base, a typical SLM with 220M parameters, achieve approximately 75% accuracy with limited labeled data (up to 15% of full data), which shows great potentials of SLMs with prompt-learning. Based on this, We further validate the effectiveness of active few-shot sampling and the ensemble strategy in the prompt-learning pipeline that contribute to a remarkable performance gain. Besides, in zero-shot settings with a fixed model, we underscore a pivotal observation that, although the GPT-3.5-turbo equipped with around 154B parameters garners an accuracy of 55.16%, the power of well designed prompts becomes evident when the FLAN-T5-large, a model with a mere 0.5% of GPT-3.5-turbo's parameters, achieves an accuracy exceeding 31% with the optimized prompt, a leap from its sub-18% performance with an unoptimized one. Our findings underscore the promise of prompt-learning in classification tasks with SLMs, emphasizing the benefits of active few-shot sampling, and ensemble strategies in few-shot settings, and the importance of prompt engineering in zero-shot settings.

* 10 pages excluding appendix and reference 
Viaarxiv icon

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Aug 25, 2023
Yiyang Yao, Peng Liu, Tiancheng Zhao, Qianqian Zhang, Jiajia Liao, Chunxin Fang, Kyusong Lee, Qing Wang

Figure 1 for How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Figure 2 for How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Figure 3 for How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Figure 4 for How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at \url{https://github.com/om-ai-lab/OVDEval}

Viaarxiv icon

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Aug 14, 2023
Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang

Figure 1 for IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse
Figure 2 for IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse
Figure 3 for IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse
Figure 4 for IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task. Transfer RL methods can reshape the policy optimization objective (optimization transfer) or influence the behavior policy (behavior transfer) using source policies. However, selecting the appropriate source policy with limited samples to guide target policy learning has been a challenge. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions, which can lead to non-stationary policy optimization or heavy sampling costs, diminishing transfer effectiveness. To address this challenge, we propose a novel transfer RL method that selects the source policy without training extra components. Our method utilizes the Q function in the actor-critic framework to guide policy selection, choosing the source policy with the largest one-step improvement over the current target policy. We integrate optimization transfer and behavior transfer (IOB) by regularizing the learned policy to mimic the guidance policy and combining them as the behavior policy. This integration significantly enhances transfer effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark tasks, and improves final performance and knowledge transferability in continual learning scenarios. Additionally, we show that our optimization transfer technique is guaranteed to improve target policy learning.

* 26 pages, 9 figures 
Viaarxiv icon

ChatGPT for Software Security: Exploring the Strengths and Limitations of ChatGPT in the Security Applications

Aug 10, 2023
Zhilong Wang, Lan Zhang, Peng Liu

ChatGPT, as a versatile large language model, has demonstrated remarkable potential in addressing inquiries across various domains. Its ability to analyze, comprehend, and synthesize information from both online sources and user inputs has garnered significant attention. Previous research has explored ChatGPT's competence in code generation and code reviews. In this paper, we delve into ChatGPT's capabilities in security-oriented program analysis, focusing on perspectives from both attackers and security analysts. We present a case study involving several security-oriented program analysis tasks while deliberately introducing challenges to assess ChatGPT's responses. Through an examination of the quality of answers provided by ChatGPT, we gain a clearer understanding of its strengths and limitations in the realm of security-oriented program analysis.

* 1 Table, 8 figures 
Viaarxiv icon

Double-chain Constraints for 3D Human Pose Estimation in Images and Videos

Aug 10, 2023
Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Wenming Yang

Figure 1 for Double-chain Constraints for 3D Human Pose Estimation in Images and Videos
Figure 2 for Double-chain Constraints for 3D Human Pose Estimation in Images and Videos
Figure 3 for Double-chain Constraints for 3D Human Pose Estimation in Images and Videos
Figure 4 for Double-chain Constraints for 3D Human Pose Estimation in Images and Videos

Reconstructing 3D poses from 2D poses lacking depth information is particularly challenging due to the complexity and diversity of human motion. The key is to effectively model the spatial constraints between joints to leverage their inherent dependencies. Thus, we propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose through a double-chain design consisting of local-to-global and global-to-local chains to obtain a complex representation more suitable for the current human pose. Specifically, we combine the advantages of GCN and Transformer and design a Local Constraint Module (LCM) based on GCN and a Global Constraint Module (GCM) based on self-attention mechanism as well as a Feature Interaction Module (FIM). The proposed method fully captures the multi-level dependencies between human body joints to optimize the modeling capability of the model. Moreover, we propose a method to use temporal information into the single-frame model by guiding the video sequence embedding through the joint embedding of the target frame, with negligible increase in computational cost. Experimental results demonstrate that DC-GCT achieves state-of-the-art performance on two challenging datasets (Human3.6M and MPI-INF-3DHP). Notably, our model achieves state-of-the-art performance on all action categories in the Human3.6M dataset using detected 2D poses from CPN, and our code is available at: https://github.com/KHB1698/DC-GCT.

Viaarxiv icon

Sensing Aided Covert Communications: Turning Interference into Allies

Jul 21, 2023
Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu

Figure 1 for Sensing Aided Covert Communications: Turning Interference into Allies
Figure 2 for Sensing Aided Covert Communications: Turning Interference into Allies
Figure 3 for Sensing Aided Covert Communications: Turning Interference into Allies
Figure 4 for Sensing Aided Covert Communications: Turning Interference into Allies

In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended Kalman filtering technique is employed to predict its trajectory as well as the corresponding channels. Depending on the maneuvering altitude of adversary target, two channel models are considered, with the aim of maximizing the covert transmission rate by jointly designing the radar waveform and communication transmit beamforming vector based on the constructed channels. For the free-space propagation model, by decoupling the joint design, we propose an efficient algorithm to guarantee that the target cannot detect the transmission. For the Rician fading model, since the multi-path components cannot be estimated, a robust joint transmission scheme is proposed based on the property of the Kullback-Leibler divergence. The convergence behaviour, tracking MSE, false alarm and missed detection probabilities, and covert transmission rate are evaluated. Simulation results show that the proposed algorithms achieve accurate tracking. For both channel models, the proposed sensing-assisted covert transmission design is able to guarantee the covertness, and significantly outperforms the conventional schemes.

* 13 pages, 12 figures, submitted to IEEE journals for potential publication 
Viaarxiv icon