Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuo Liu

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge

May 23, 2024

Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang

Large vision-language models (LVLMs) are ignorant of the up-to-date knowledge, such as LLaVA series, because they cannot be updated frequently due to the large amount of resources required, and therefore fail in many cases. For example, if a LVLM was released on January 2024, and it wouldn't know the detailed plot of the new movie Dune 2, which wasn't released until February 2024. To solve the problem, a promising solution is to provide LVLMs with up-to-date knowledge via internet search during inference, i.e., internet-augmented generation (IAG), which is already integrated in some closed-source commercial LVLMs such as GPT-4V. However, the specific mechanics underpinning them remain a mystery. In this paper, we propose a plug-and-play framework, for augmenting existing LVLMs in handling visual question answering (VQA) about up-to-date knowledge, dubbed UDKAG. A hierarchical filtering model is trained to effectively and efficiently find the most helpful content from the websites returned by a search engine to prompt LVLMs with up-to-date knowledge. To train the model and evaluate our framework's performance, we propose a pipeline to automatically generate news-related VQA samples to construct a dataset, dubbed UDK-VQA. A multi-model voting mechanism is introduced to label the usefulness of website/content for VQA samples to construct the training set. Experimental results demonstrate the effectiveness of our framework, outperforming GPT-4V by about 25% in accuracy.

* 12 pages, 6 figures, a framework to augment large vision-language models with up-to-date knowledge

Via

Access Paper or Ask Questions

AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

May 13, 2024

Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, Jingping Bi

Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edges or require sufficient labeled data for model training, which harms their applicability for real-world applications. In this paper, we study this problem by cooperating with the rich knowledge encoded in large language models(LLMs) and propose a method, namely AnomalyLLM. To align the dynamic graph with LLMs, AnomalyLLM pre-trains a dynamic-aware encoder to generate the representations of edges and reprograms the edges using the prototypes of word embeddings. Along with the encoder, we design an in-context learning framework that integrates the information of a few labeled samples to achieve few-shot anomaly detection. Experiments on four datasets reveal that AnomalyLLM can not only significantly improve the performance of few-shot anomaly detection, but also achieve superior results on new anomalies without any update of model parameters.

* 13pages

Via

Access Paper or Ask Questions

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Apr 24, 2024

Figure 1 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Figure 2 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Figure 3 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Figure 4 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering $32$ core meta-tasks and $162$ subtasks in multimodal understanding. Due to its extensive task coverage, MMT-Bench enables the evaluation of LVLMs using a task map, facilitating the discovery of in- and out-of-domain tasks. Evaluation results involving $30$ LVLMs such as the proprietary GPT-4V, GeminiProVision, and open-sourced InternVL-Chat, underscore the significant challenges posed by MMT-Bench. We anticipate that MMT-Bench will inspire the community to develop next-generation multimodal foundation models aimed at achieving general-purpose multimodal intelligence.

* 77 pages, 41 figures

Via

Access Paper or Ask Questions

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Apr 23, 2024

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, Ge Li

Code translation tools are developed for automatic source-to-source translation. Although learning-based transpilers have shown impressive enhancement against rule-based counterparts, owing to their task-specific pre-training on extensive monolingual corpora. Their current performance still remains unsatisfactory for practical deployment, and the associated training resources are also prohibitively expensive. LLMs pre-trained on huge amounts of human-written code/text have shown remarkable performance in many code intelligence tasks due to their powerful generality, even without task-specific training. Thus, LLMs can potentially circumvent the above limitations, but they have not been exhaustively explored yet. This paper investigates diverse LLMs and learning-based transpilers for automated code translation tasks, finding that: although certain LLMs have outperformed current transpilers, they still have some accuracy issues, where most of the failures are induced by a lack of comprehension of source programs (38.51%), missing clear instructions on I/O types in translation (14.94%), and ignoring discrepancies between source and target programs (41.38%). Enlightened by the above findings, we propose UniTrans, an Unified code Translation framework, applicable to various LLMs, for unleashing their power in this field. Specifically, UniTrans first craft a series of test cases for target programs with the assistance of source programs. Next, it harnesses the above auto-generated test cases to augment the code translation and then evaluate their correctness via execution. Afterward, UniTrans further (iteratively) repairs incorrectly translated programs prompted by test case execution results. Extensive experiments are conducted on six translation datasets between Python, Java, and C++. Three recent LLMs of diverse sizes are tested with UniTrans, and all achieve substantial improvements.

* 23 pages, 7 figures, accepted by FSE'24 (2024 ACM International Conference on the Foundations of Software Engineering)

Via

Access Paper or Ask Questions

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Apr 19, 2024

Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.

Via

Access Paper or Ask Questions

Inductive Cognitive Diagnosis for Fast Student Learning in Web-Based Online Intelligent Education Systems

Apr 17, 2024

Shuo Liu, Junhao Shen, Hong Qian, Aimin Zhou

Cognitive diagnosis aims to gauge students' mastery levels based on their response logs. Serving as a pivotal module in web-based online intelligent education systems (WOIESs), it plays an upstream and fundamental role in downstream tasks like learning item recommendation and computerized adaptive testing. WOIESs are open learning environment where numerous new students constantly register and complete exercises. In WOIESs, efficient cognitive diagnosis is crucial to fast feedback and accelerating student learning. However, the existing cognitive diagnosis methods always employ intrinsically transductive student-specific embeddings, which become slow and costly due to retraining when dealing with new students who are unseen during training. To this end, this paper proposes an inductive cognitive diagnosis model (ICDM) for fast new students' mastery levels inference in WOIESs. Specifically, in ICDM, we propose a novel student-centered graph (SCG). Rather than inferring mastery levels through updating student-specific embedding, we derive the inductive mastery levels as the aggregated outcomes of students' neighbors in SCG. Namely, SCG enables to shift the task from finding the most suitable student-specific embedding that fits the response logs to finding the most suitable representations for different node types in SCG, and the latter is more efficient since it no longer requires retraining. To obtain this representation, ICDM consists of a construction-aggregation-generation-transformation process to learn the final representation of students, exercises and concepts. Extensive experiments across real-world datasets show that, compared with the existing cognitive diagnosis methods that are always transductive, ICDM is much more faster while maintains the competitive inference performance for new students.

* WWW 2024

Via

Access Paper or Ask Questions

Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Mar 28, 2024

Shuo Liu, Yihui Mao

Figure 1 for Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Figure 2 for Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Figure 3 for Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Figure 4 for Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier Functions

Dynamic obstacle avoidance is a challenging topic for optimal control and optimization-based trajectory planning problems, especially when in a tight environment. Many existing works use control barrier functions (CBFs) to enforce safety constraints within control systems. Inside these works, CBFs are usually formulated under model predictive control (MPC) framework to anticipate future states and make informed decisions, or integrated with path planning algorithms as a safety enhancement tool. However, these approaches usually require knowledge of the obstacle boundary equations or have very slow computational efficiency. In this paper, we propose a novel framework to the iterative MPC with discrete-time CBFs (DCBFs) to generate a collision-free trajectory. The DCBFs are obtained from convex polyhedra generated in sequential grid maps, without the need to know the boundary equations of obstacles. Additionally, a path planning algorithm is incorporated into this framework to ensure the global optimality of the generated trajectory. We demonstrate through numerical examples that our framework enables a unicycle robot to safely and efficiently navigate through tight and dynamically changing environments, tackling both convex and nonconvex obstacles with remarkable computing efficiency and reliability in control and trajectory generation.

* 9 pages, 4 figures. arXiv admin note: text overlap with arXiv:2210.04361

Via

Access Paper or Ask Questions

Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements

Feb 11, 2024

Bingbing Zhang, Shuo Liu, Shanmin Zhou, Daxiong Ji, Tao Wang, Tian Xia, Wen Xu

Figure 1 for Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements

Figure 2 for Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements

Figure 3 for Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements

Figure 4 for Sensor Misalignment-tolerant AUV Navigation with Passive DoA and Doppler Measurements

We present a sensor misalignment-tolerant AUV navigation method that leverages measurements from an acoustic array and dead reckoned information. Recent studies have demonstrated the potential use of passive acoustic Direction of Arrival (DoA) measurements for AUV navigation without requiring ranging measurements. However, the sensor misalignment between the acoustic array and the attitude sensor was not accounted for. Such misalignment may deteriorate the navigation accuracy. This paper proposes a novel approach that allows simultaneous AUV navigation, beacon localization, and sensor alignment. An Unscented Kalman Filter (UKF) that enables the necessary calculations to be completed at an affordable computational load is developed. A Nonlinear Least Squares (NLS)-based technique is employed to find an initial solution for beacon localization and sensor alignment as early as possible using a short-term window of measurements. Experimental results demonstrate the performance of the proposed method.

Via

Access Paper or Ask Questions

Vignat: Vulnerability identification by learning code semantics via graph attention networks

Oct 30, 2023

Shuo Liu, Gail Kaiser

Figure 1 for Vignat: Vulnerability identification by learning code semantics via graph attention networks

Figure 2 for Vignat: Vulnerability identification by learning code semantics via graph attention networks

Figure 3 for Vignat: Vulnerability identification by learning code semantics via graph attention networks

Figure 4 for Vignat: Vulnerability identification by learning code semantics via graph attention networks

Vulnerability identification is crucial to protect software systems from attacks for cyber-security. However, huge projects have more than millions of lines of code, and the complex dependencies make it hard to carry out traditional static and dynamic methods. Furthermore, the semantic structure of various types of vulnerabilities differs greatly and may occur simultaneously, making general rule-based methods difficult to extend. In this paper, we propose \textit{Vignat}, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code. We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection. The results show that Vignat is able to achieve $57.38\%$ accuracy on reliable datasets derived from popular C libraries. Furthermore, the interpretability of our GATs provides valuable insights into vulnerability patterns.

Via

Access Paper or Ask Questions

FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

Sep 22, 2023

Shuo Liu, Lulu Han, Xiaoyang Liu, Junli Ren, Fang Wang, YingLiu, Yuanshan Lin

Figure 1 for FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

Figure 2 for FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

Figure 3 for FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

Figure 4 for FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

Fish tracking plays a vital role in understanding fish behavior and ecology. However, existing tracking methods face challenges in accuracy and robustness dues to morphological change of fish, occlusion and complex environment. This paper proposes FishMOT(Multiple Object Tracking for Fish), a novel fish tracking approach combining object detection and IoU matching, including basic module, interaction module and refind module. Wherein, a basic module performs target association based on IoU of detection boxes between successive frames to deal with morphological change of fish; an interaction module combines IoU of detection boxes and IoU of fish entity to handle occlusions; a refind module use spatio-temporal information uses spatio-temporal information to overcome the tracking failure resulting from the missed detection by the detector under complex environment. FishMOT reduces the computational complexity and memory consumption since it does not require complex feature extraction or identity assignment per fish, and does not need Kalman filter to predict the detection boxes of successive frame. Experimental results demonstrate FishMOT outperforms state-of-the-art multi-object trackers and specialized fish tracking tools in terms of MOTA, accuracy, computation time, memory consumption, etc.. Furthermore, the method exhibits excellent robustness and generalizability for varying environments and fish numbers. The simplified workflow and strong performance make FishMOT as a highly effective fish tracking approach. The source codes and pre-trained models are available at: https://github.com/gakkistar/FishMOT

Via

Access Paper or Ask Questions