Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongsheng Li

National University of Defense Technology, Changsha, China

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Aug 18, 2024

Xinnan Dai, Qihao Wen, Yifei Shen, Hongzhi Wen, Dongsheng Li, Jiliang Tang, Caihua Shan

Figure 1 for Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Figure 2 for Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Figure 3 for Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Figure 4 for Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

Abstract:Large Language Models (LLMs) have achieved great success in various reasoning tasks. In this work, we focus on the graph reasoning ability of LLMs. Although theoretical studies proved that LLMs are capable of handling graph reasoning tasks, empirical evaluations reveal numerous failures. To deepen our understanding on this discrepancy, we revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these three fundamental tasks. Meanwhile, we perform a real-world investigation on knowledge graphs and make consistent observations with our findings. The codes and datasets are available.

Via

Access Paper or Ask Questions

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

Aug 13, 2024

Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

Abstract:Video temporal grounding is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLM) to capture detailed characteristics of diverse scenes and objects from video frames. However, as pre-trained on images, VLM may struggle to distinguish action-sensitive patterns from static objects, making it necessary to adapt them to specific data domains for effective feature representation over temporal grounding. We address two primary challenges to achieve this goal. Specifically, to mitigate high adaptation costs, we propose an efficient preliminary in-domain fine-tuning paradigm for feature adaptation, where downstream-adaptive features are learned through several pretext tasks. Furthermore, to integrate action-sensitive information into VLM, we introduce Action-Cue-Injected Temporal Prompt Learning (ActPrompt), which injects action cues into the image encoder of VLM for better discovering action-sensitive patterns. Extensive experiments demonstrate that ActPrompt is an off-the-shelf training framework that can be effectively applied to various SOTA methods, resulting in notable improvements. The complete code used in this study is provided in the supplementary materials.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

GraphTransfer: A Generic Feature Fusion Framework for Collaborative Filtering

Aug 11, 2024

Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Ning Gu

Figure 1 for GraphTransfer: A Generic Feature Fusion Framework for Collaborative Filtering

Figure 2 for GraphTransfer: A Generic Feature Fusion Framework for Collaborative Filtering

Figure 3 for GraphTransfer: A Generic Feature Fusion Framework for Collaborative Filtering

Figure 4 for GraphTransfer: A Generic Feature Fusion Framework for Collaborative Filtering

Abstract:Graph Neural Networks (GNNs) have demonstrated effectiveness in collaborative filtering tasks due to their ability to extract powerful structural features. However, combining the graph features extracted from user-item interactions and auxiliary features extracted from user genres and item properties remains a challenge. Currently available fusion methods face two major issues: 1) simple methods such as concatenation and summation are generic, but not accurate in capturing feature relationships; 2) task-specific methods like attention mechanisms and meta paths may not be suitable for general feature fusion. To address these challenges, we present GraphTransfer, a simple but universal feature fusion framework for GNN-based collaborative filtering. Our method accurately fuses different types of features by first extracting graph features from the user-item interaction graph and auxiliary features from users and items using GCN. The proposed cross fusion module then effectively bridges the semantic gaps between the interaction scores of different features. Theoretical analysis and experiments on public datasets show that GraphTransfer outperforms other feature fusion methods in CF tasks. Additionally, we demonstrate the universality of our framework via empirical studies in three other scenarios, showing that GraphTransfer leads to significant improvements in the performance of CF algorithms.

Via

Access Paper or Ask Questions

AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Jul 29, 2024

Wenxin Zhao, Peng Zhang, Hansu Gu, Dongsheng Li, Tun Lu, Ning Gu

Figure 1 for AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Figure 2 for AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Figure 3 for AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Figure 4 for AOTree: Aspect Order Tree-based Model for Explainable Recommendation

Abstract:Recent recommender systems aim to provide not only accurate recommendations but also explanations that help users understand them better. However, most existing explainable recommendations only consider the importance of content in reviews, such as words or aspects, and ignore the ordering relationship among them. This oversight neglects crucial ordering dimensions in the human decision-making process, leading to suboptimal performance. Therefore, in this paper, we propose Aspect Order Tree-based (AOTree) explainable recommendation method, inspired by the Order Effects Theory from cognitive and decision psychology, in order to capture the dependency relationships among decisive factors. We first validate the theory in the recommendation scenario by analyzing the reviews of the users. Then, according to the theory, the proposed AOTree expands the construction of the decision tree to capture aspect orders in users' decision-making processes, and use attention mechanisms to make predictions based on the aspect orders. Extensive experiments demonstrate our method's effectiveness on rating predictions, and our approach aligns more consistently with the user' s decision-making process by displaying explanations in a particular order, thereby enhancing interpretability.

Via

Access Paper or Ask Questions

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Jul 24, 2024

Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dongsheng Li

Figure 1 for Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Figure 2 for Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Figure 3 for Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Figure 4 for Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Abstract:Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), greatly reducing memory usage but resulting in noticeable performance degradation. In this paper, we identify an imbalance in fine-tuning quantized pre-trained models: overly complex adapter inputs and outputs versus low effective trainability of the adaptation. We propose Quantized LLMs with Balanced-rank Adaptation (Q-BaRA), which simplifies the adapter inputs and outputs while increasing the adapter's rank to achieve a more suitable balance for fine-tuning quantized LLMs. Additionally, for scenarios where fine-tuned LLMs need to be deployed as low-precision inference models, we introduce Quantization-Aware Fine-tuning with Higher Rank Adaptation (QA-HiRA), which simplifies the adapter inputs and outputs to align with the pre-trained model's block-wise quantization while employing a single matrix to achieve a higher rank. Both Q-BaRA and QA-HiRA are easily implemented and offer the following optimizations: (i) Q-BaRA consistently achieves the highest accuracy compared to baselines and other variants, requiring the same number of trainable parameters and computational effort; (ii) QA-HiRA naturally merges adapter parameters into the block-wise quantized model after fine-tuning, achieving the highest accuracy compared to other methods. We apply our Q-BaRA and QA-HiRA to the LLaMA and LLaMA2 model families and validate their effectiveness across different fine-tuning datasets and downstream scenarios. Code will be made available at \href{https://github.com/xiaocaigou/qbaraqahira}{https://github.com/xiaocaigou/qbaraqahira}

Via

Access Paper or Ask Questions

The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

Jul 21, 2024

Junwen Luo, Chengyong Jiang, Qingyuan Chen, Dongqi Han, Yansen Wang, Biao Yan, Dongsheng Li, Jiayi Zhang

Figure 1 for The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

Figure 2 for The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

Figure 3 for The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

Figure 4 for The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

Abstract:Effective visual brain-machine interfaces (BMI) is based on reliable and stable EEG biomarkers. However, traditional adaptive filter-based approaches may suffer from individual variations in EEG signals, while deep neural network-based approaches may be hindered by the non-stationarity of EEG signals caused by biomarker attenuation and background oscillations. To address these challenges, we propose the Visual Evoked Potential Booster (VEP Booster), a novel closed-loop AI framework that generates reliable and stable EEG biomarkers under visual stimulation protocols. Our system leverages an image generator to refine stimulus images based on real-time feedback from human EEG signals, generating visual stimuli tailored to the preferences of primary visual cortex (V1) neurons and enabling effective targeting of neurons most responsive to stimuli. We validated our approach by implementing a system and employing steady-state visual evoked potential (SSVEP) visual protocols in five human subjects. Our results show significant enhancements in the reliability and utility of EEG biomarkers for all individuals, with the largest improvement in SSVEP response being 105%, the smallest being 28%, and the average increase being 76.5%. These promising results have implications for both clinical and technological applications

* 19 pages, 6 figures

Via

Access Paper or Ask Questions

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Jul 02, 2024

Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin(+2 more)

Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Abstract:The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefilling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs. To address this gap, we introduce MInference (Milliontokens Inference), a sparse calculation method designed to accelerate pre-filling of long-sequence processing. Specifically, we identify three unique patterns in long-context attention matrices-the A-shape, Vertical-Slash, and Block-Sparsethat can be leveraged for efficient sparse computation on GPUs. We determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs. Our proposed technique can be directly applied to existing LLMs without any modifications to the pre-training setup or additional fine-tuning. By evaluating on a wide range of downstream tasks, including InfiniteBench, RULER, PG-19, and Needle In A Haystack, and models including LLaMA-3-1M, GLM4-1M, Yi-200K, Phi-3-128K, and Qwen2-128K, we demonstrate that MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy. Our code is available at https://aka.ms/MInference.

Via

Access Paper or Ask Questions

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Jun 20, 2024

Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

Figure 1 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Figure 2 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Figure 3 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Figure 4 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Abstract:The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the specialized agent to multi-agent systems to improve task-solving capability still remains a significant challenge. In this paper, we introduce EvoAgent, a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm, thereby improving the effectiveness of LLM-based agents in solving tasks. Specifically, we consider the existing agent frameworks as the initial individual and then apply a series of evolutionary operators (e.g., mutation, crossover, selection, etc.) to generate multiple agents with diverse agent settings. EvoAgent can be generalized to any LLM-based agent framework, and can automatically extend the existing agent framework to multi-agent systems without any extra human designs. Experimental results across various tasks have shown that EvoAgent can automatically generate multiple expert agents and significantly enhance the task-solving capabilities of LLM-based agents.

* Work in process

Via

Access Paper or Ask Questions

Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Jun 04, 2024

Yijiong Yu, Huiqiang Jiang, Xufang Luo, Qianhui Wu, Chin-Yew Lin, Dongsheng Li, Yuqing Yang, Yongfeng Huang, Lili Qiu

Figure 1 for Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Figure 2 for Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Figure 3 for Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Figure 4 for Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Abstract:Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.

Via

Access Paper or Ask Questions

DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

May 30, 2024

Wenli Sun, Xinyang Jiang, Dongsheng Li, Cairong Zhao

Figure 1 for DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Figure 2 for DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Figure 3 for DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Figure 4 for DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

Abstract:Person Re-Identification (ReID) systems pose a significant security risk from backdoor attacks, allowing adversaries to evade tracking or impersonate others. Beyond recognizing this issue, we investigate how backdoor attacks can be deployed in real-world scenarios, where a ReID model is typically trained on data collected in the digital domain and then deployed in a physical environment. This attack scenario requires an attack flow that embeds backdoor triggers in the digital domain realistically enough to also activate the buried backdoor in person ReID models in the physical domain. This paper realizes this attack flow by leveraging a diffusion model to generate realistic accessories on pedestrian images (e.g., bags, hats, etc.) as backdoor triggers. However, the noticeable domain gap between the triggers generated by the off-the-shelf diffusion model and their physical counterparts results in a low attack success rate. Therefore, we introduce a novel diffusion-based physical backdoor attack (DiffPhysBA) method that adopts a training-free similarity-guided sampling process to enhance the resemblance between generated and physical triggers. Consequently, DiffPhysBA can generate realistic attributes as semantic-level triggers in the digital domain and provides higher physical ASR compared to the direct paste method by 25.6% on the real-world test set. Through evaluations on newly proposed real-world and synthetic ReID test sets, DiffPhysBA demonstrates an impressive success rate exceeding 90% in both the digital and physical domains. Notably, it excels in digital stealth metrics and can effectively evade state-of-the-art defense methods.

Via

Access Paper or Ask Questions