Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Liu

Peter

Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Jan 13, 2025

Renkai Li, Xin Yuan, Wei Liu, Xin Xu

Figure 1 for Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Figure 2 for Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Figure 3 for Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Figure 4 for Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Abstract:Video-based person re-identification (ReID) has become increasingly important due to its applications in video surveillance applications. By employing events in video-based person ReID, more motion information can be provided between continuous frames to improve recognition accuracy. Previous approaches have assisted by introducing event data into the video person ReID task, but they still cannot avoid the privacy leakage problem caused by RGB images. In order to avoid privacy attacks and to take advantage of the benefits of event data, we consider using only event data. To make full use of the information in the event stream, we propose a Cross-Modality and Temporal Collaboration (CMTC) network for event-based video person ReID. First, we design an event transform network to obtain corresponding auxiliary information from the input of raw events. Additionally, we propose a differential modality collaboration module to balance the roles of events and auxiliaries to achieve complementary effects. Furthermore, we introduce a temporal collaboration module to exploit motion information and appearance cues. Experimental results demonstrate that our method outperforms others in the task of event-based video person ReID.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

VAGeo: View-specific Attention for Cross-View Object Geo-Localization

Jan 13, 2025

Zhongyang Li, Xin Yuan, Wei Liu, Xin Xu

Figure 1 for VAGeo: View-specific Attention for Cross-View Object Geo-Localization

Figure 2 for VAGeo: View-specific Attention for Cross-View Object Geo-Localization

Figure 3 for VAGeo: View-specific Attention for Cross-View Object Geo-Localization

Figure 4 for VAGeo: View-specific Attention for Cross-View Object Geo-Localization

Abstract:Cross-view object geo-localization (CVOGL) aims to locate an object of interest in a captured ground- or drone-view image within the satellite image. However, existing works treat ground-view and drone-view query images equivalently, overlooking their inherent viewpoint discrepancies and the spatial correlation between the query image and the satellite-view reference image. To this end, this paper proposes a novel View-specific Attention Geo-localization method (VAGeo) for accurate CVOGL. Specifically, VAGeo contains two key modules: view-specific positional encoding (VSPE) module and channel-spatial hybrid attention (CSHA) module. In object-level, according to the characteristics of different viewpoints of ground and drone query images, viewpoint-specific positional codings are designed to more accurately identify the click-point object of the query image in the VSPE module. In feature-level, a hybrid attention in the CSHA module is introduced by combining channel attention and spatial attention mechanisms simultaneously for learning discriminative features. Extensive experimental results demonstrate that the proposed VAGeo gains a significant performance improvement, i.e., improving acc@0.25/acc@0.5 on the CVOGL dataset from 45.43%/42.24% to 48.21%/45.22% for ground-view, and from 61.97%/57.66% to 66.19%/61.87% for drone-view.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Large Language Models for Bioinformatics

Jan 10, 2025

Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao Lu, Shang Gao, Yue Wang, Peilong Wang(+45 more)

Figure 1 for Large Language Models for Bioinformatics

Abstract:With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

* 64 pages, 1 figure

Via

Access Paper or Ask Questions

Adaptive Path-Planning for Autonomous Robots: A UCH-Enhanced Q-Learning Approach

Jan 09, 2025

Wei Liu, Ruiyang Wang, Haonan Wang, Guangwei Liu

Figure 1 for Adaptive Path-Planning for Autonomous Robots: A UCH-Enhanced Q-Learning Approach

Figure 2 for Adaptive Path-Planning for Autonomous Robots: A UCH-Enhanced Q-Learning Approach

Figure 3 for Adaptive Path-Planning for Autonomous Robots: A UCH-Enhanced Q-Learning Approach

Figure 4 for Adaptive Path-Planning for Autonomous Robots: A UCH-Enhanced Q-Learning Approach

Abstract:Q-learning methods are widely used in robot path planning but often face challenges of inefficient search and slow convergence. We propose an Improved Q-learning (IQL) framework that enhances standard Q-learning in two significant ways. First, we introduce the Path Adaptive Collaborative Optimization (PACO) algorithm to optimize Q-table initialization, providing better initial estimates and accelerating learning. Second, we incorporate a Utility-Controlled Heuristic (UCH) mechanism with dynamically tuned parameters to optimize the reward function, enhancing the algorithm's accuracy and effectiveness in path-planning tasks. Extensive experiments in three different raster grid environments validate the superior performance of our IQL framework. The results demonstrate that our IQL algorithm outperforms existing methods, including FIQL, PP-QL-based CPP, DFQL, and QMABC algorithms, in terms of path-planning capabilities.

* IEEE, 2025
* 25 pages, 20 figures

Via

Access Paper or Ask Questions

Multimodal Graph Constrastive Learning and Prompt for ChartQA

Jan 08, 2025

Yue Dai, Soyeon Caren Han, Wei Liu

Abstract:ChartQA presents significant challenges due to the complex distribution of chart elements and the implicit patterns embedded within the underlying data. In this chapter, we have developed a joint multimodal scene graph for charts, explicitly representing the relationships between chart elements and their associated patterns. Our proposed multimodal scene graph consists of two components: a visual graph and a textual graph, each designed to capture the structural and semantic information within the chart. To unify representations across these different modalities, we introduce a multimodal graph contrastive learning approach that learns unified representations by maximizing similarity between nodes representing the same object across multimodal graphs. The learned graph representations can be seamlessly incorporated into a transformer decoder as a soft prompt. Additionally, given the growing need for Multimodal Large Language Models (MLLMs) in zero-shot scenarios, we have designed Chain-of-Thought (CoT) prompts for MLLMs to reduce hallucinations. We tested both methods on public benchmarks such as ChartQA, OpenCQA, and ChartX, demonstrating improved performance and validating the effectiveness of our proposed methods.

Via

Access Paper or Ask Questions

Modeling All Response Surfaces in One for Conditional Search Spaces

Jan 08, 2025

Jiaxing Li, Wei Liu, Chao Xue, Yibing Zhan, Xiaoxing Wang, Weifeng Liu, Dacheng Tao

Figure 1 for Modeling All Response Surfaces in One for Conditional Search Spaces

Figure 2 for Modeling All Response Surfaces in One for Conditional Search Spaces

Figure 3 for Modeling All Response Surfaces in One for Conditional Search Spaces

Figure 4 for Modeling All Response Surfaces in One for Conditional Search Spaces

Abstract:Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.

Via

Access Paper or Ask Questions

TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Jan 08, 2025

Qiang Sun, Sirui Li, Du Huynh, Mark Reynolds, Wei Liu

Figure 1 for TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Figure 2 for TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Figure 3 for TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Figure 4 for TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Abstract:Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with \textbf{TimelineKGQA}, a universal temporal QA generator applicable to any TKGs. The code is available at: \url{https://github.com/PascalSun/TimelineKGQA} as an open source Python package.

Via

Access Paper or Ask Questions

More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Jan 07, 2025

Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Shuo Shang, Xiuying Chen, Rui Yan

Figure 1 for More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Figure 2 for More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Figure 3 for More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Figure 4 for More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Abstract:Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as the number of ICL demonstrations increases from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we introduce DR-ICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives. Globally, DR-ICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels. Locally, it dynamically adjusts the weighting of many-shot demonstrations by leveraging cumulative advantages inspired by reinforcement learning, thereby improving generalization. This approach allows the model to handle varying numbers of shots effectively, mitigating the impact of noisy data. Recognizing the lack of multi-task datasets with diverse many-shot distributions, we develop the Many-Shot ICL Benchmark (MICLB)-a large-scale benchmark covering shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes. MICLB facilitates the evaluation of many-shot ICL strategies across seven prominent NLP tasks and 50 distinct datasets. Experimental results demonstrate that LLMs enhanced with DR-ICL achieve significant improvements in many-shot setups across various tasks, including both in-domain and out-of-domain scenarios. We release the code and benchmark dataset hoping to facilitate further research in many-shot ICL.

* 13 pages, 8 figures, 11 tables

Via

Access Paper or Ask Questions

Distillation-Enhanced Physical Adversarial Attacks

Jan 04, 2025

Wei Liu, Yonglin Wu, Chaoqun Li, Zhuodong Liu, Huanqian Yan

Abstract:The study of physical adversarial patches is crucial for identifying vulnerabilities in AI-based recognition systems and developing more robust deep learning models. While recent research has focused on improving patch stealthiness for greater practical applicability, achieving an effective balance between stealth and attack performance remains a significant challenge. To address this issue, we propose a novel physical adversarial attack method that leverages knowledge distillation. Specifically, we first define a stealthy color space tailored to the target environment to ensure smooth blending. Then, we optimize an adversarial patch in an unconstrained color space, which serves as the 'teacher' patch. Finally, we use an adversarial knowledge distillation module to transfer the teacher patch's knowledge to the 'student' patch, guiding the optimization of the stealthy patch. Experimental results show that our approach improves attack performance by 20%, while maintaining stealth, highlighting its practical value.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions

DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Dec 30, 2024

Xiaolin Hu, Xiang Cheng, Peiyu Liu, Wei Liu, Jian Luan, Bin Wang, Yong Liu

Figure 1 for DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Figure 2 for DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Figure 3 for DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Figure 4 for DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Abstract:Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions