Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhou

Yilin

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Jun 20, 2024

Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou(+3 more)

Figure 1 for Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Figure 2 for Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Figure 3 for Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Figure 4 for Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Abstract:Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs' planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.

* Work in progress

Via

Access Paper or Ask Questions

Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Jun 20, 2024

Long Lei, Jun Zhou, Jialun Pei, Baoliang Zhao, Yueming Jin, Yuen-Chun Jeremy Teoh, Jing Qin, Pheng-Ann Heng

Figure 1 for Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Figure 2 for Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Figure 3 for Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Figure 4 for Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Abstract:A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations between 2D frames and 3D volumes to be registered, resulting in real-time and accurate cardiac ultrasound frame-to-volume registration being a very challenging task. This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg. Specifically, the proposed model leverages epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features, thereby boosting the cross-dimensional matching effectiveness of low-quality ultrasound modalities. We further embed an inter-frame discriminative regularization term within the hybrid supervised learning to increase the distinction between adjacent slices in the same ultrasound volume to ensure registration stability. Experimental results on the reprocessed CAMUS dataset demonstrate that our CU-Reg surpasses existing methods in terms of registration accuracy and efficiency, meeting the guidance requirements of clinical cardiac interventional surgery.

* This paper has been accepted by MICCAI 2024

Via

Access Paper or Ask Questions

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Jun 18, 2024

Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning(+2 more)

Figure 1 for GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Figure 2 for GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Figure 3 for GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Figure 4 for GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Abstract:Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module.

Via

Access Paper or Ask Questions

Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Jun 18, 2024

Gangwei Jiang, Zhaoyi Li, Caigao Jiang, Siqiao Xue, Jun Zhou, Linqi Song, Defu Lian, Ying Wei

Figure 1 for Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Figure 2 for Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Figure 3 for Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Figure 4 for Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Abstract:Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the Instruction Vector (IV) framework to capture model representations highly related to specific instruction-following capabilities, thereby making it possible to understand model-intrinsic forgetting. Through the analysis of IV dynamics pre and post-training, we suggest that fine-tuning mostly adds specialized reasoning patterns instead of erasing previous skills, which may appear as forgetting. Building on this insight, we develop IV-guided training, which aims to preserve original computation graph, thereby mitigating catastrophic forgetting. Empirical tests on three benchmarks confirm the efficacy of this new approach, supporting the relationship between IVs and forgetting. Our code will be made available soon.

Via

Access Paper or Ask Questions

Asymmetrical Siamese Network for Point Clouds Normal Estimation

Jun 14, 2024

Wei Jin, Jun Zhou

Figure 1 for Asymmetrical Siamese Network for Point Clouds Normal Estimation

Figure 2 for Asymmetrical Siamese Network for Point Clouds Normal Estimation

Figure 3 for Asymmetrical Siamese Network for Point Clouds Normal Estimation

Figure 4 for Asymmetrical Siamese Network for Point Clouds Normal Estimation

Abstract:In recent years, deep learning-based point cloud normal estimation has made great progress. However, existing methods mainly rely on the PCPNet dataset, leading to overfitting. In addition, the correlation between point clouds with different noise scales remains unexplored, resulting in poor performance in cross-domain scenarios. In this paper, we explore the consistency of intrinsic features learned from clean and noisy point clouds using an Asymmetric Siamese Network architecture. By applying reasonable constraints between features extracted from different branches, we enhance the quality of normal estimation. Moreover, we introduce a novel multi-view normal estimation dataset that includes a larger variety of shapes with different noise levels. Evaluation of existing methods on this new dataset reveals their inability to adapt to different types of shapes, indicating a degree of overfitting. Extensive experiments show that the proposed dataset poses significant challenges for point cloud normal estimation and that our feature constraint mechanism effectively improves upon existing methods and reduces overfitting in current architectures.

Via

Access Paper or Ask Questions

Personalized Binomial DAGs Learning with Network Structured Covariates

Jun 10, 2024

Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar

Figure 1 for Personalized Binomial DAGs Learning with Network Structured Covariates

Figure 2 for Personalized Binomial DAGs Learning with Network Structured Covariates

Figure 3 for Personalized Binomial DAGs Learning with Network Structured Covariates

Figure 4 for Personalized Binomial DAGs Learning with Network Structured Covariates

Abstract:The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.

Via

Access Paper or Ask Questions

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

May 30, 2024

Chunjing Gan, Dan Yang, Binbin Hu, Hanxiao Zhang, Siyuan Li, Ziqi Liu, Yue Shen, Lin Ju, Zhiqiang Zhang, Jinjie Gu(+2 more)

Figure 1 for Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Figure 2 for Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Figure 3 for Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Figure 4 for Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Abstract:In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a bridge between queries and documents and follow a retrieve then read procedure. In this work, we argue that similarity is not always the panacea and totally relying on similarity would sometimes degrade the performance of retrieval augmented generation. To this end, we propose MetRag, a Multi layEred Thoughts enhanced Retrieval Augmented Generation framework. To begin with, beyond existing similarity oriented thought, we embrace a small scale utility model that draws supervision from an LLM for utility oriented thought and further come up with a smarter model by comprehensively combining the similarity and utility oriented thoughts. Furthermore, given the fact that the retrieved document set tends to be huge and using them in isolation makes it difficult to capture the commonalities and characteristics among them, we propose to make an LLM as a task adaptive summarizer to endow retrieval augmented generation with compactness-oriented thought. Finally, with multi layered thoughts from the precedent stages, an LLM is called for knowledge augmented generation. Extensive experiments on knowledge-intensive tasks have demonstrated the superiority of MetRag.

* 12 pages

Via

Access Paper or Ask Questions

Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

May 29, 2024

Juntao Zhang, Kun Bian, Peng Cheng, Wenbo An, Jianning Liu, Jun Zhou

Figure 1 for Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Figure 2 for Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Figure 3 for Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Figure 4 for Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Abstract:In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building efficient and general-purpose visual backbones based on SSMs is a promising direction. Compared to traditional convolutional neural networks (CNNs) and Vision Transformers (ViTs), the performance of Vision Mamba (ViM) methods is not yet fully competitive. To enable SSMs to process image data, ViMs typically flatten 2D images into 1D sequences, inevitably ignoring some 2D local dependencies, thereby weakening the model's ability to interpret spatial relationships from a global perspective. We use Fast Fourier Transform (FFT) to obtain the spectrum of the feature map and add it to the original feature map, enabling ViM to model a unified visual representation in both frequency and spatial domains. The introduction of frequency domain information enables ViM to have a global receptive field during scanning. We propose a novel model called Vim-F, which employs pure Mamba encoders and scans in both the frequency and spatial domains. Moreover, we question the necessity of position embedding in ViM and remove it accordingly in Vim-F, which helps to fully utilize the efficient long-sequence modeling capability of ViM. Finally, we redesign a patch embedding for Vim-F, leveraging a convolutional stem to capture more local correlations, further improving the performance of Vim-F. Code is available at: \url{https://github.com/yws-wxs/Vim-F}.

Via

Access Paper or Ask Questions

Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

May 27, 2024

Chunjing Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

Figure 1 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 2 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 3 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Figure 4 for Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Abstract:Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook the impact of the decision path that users take when conduct behaviors, that is, users ultimately exhibit different behaviors based on various intents. To this end, we propose HIER, a novel Hierarchical decIsion path Enhanced Representation learning for cross-domain recommendation. With the help of graph neural networks for high-order topological information of the knowledge graph between multi-source behaviors, we further adaptively learn decision paths through well-designed exemplar-level and information bottleneck based contrastive learning. Extensive experiments in online and offline environments show the superiority of HIER.

Via

Access Paper or Ask Questions

Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

May 25, 2024

Kaituo Feng, Changsheng Li, Xiaolu Zhang, Jun Zhou, Ye Yuan, Guoren Wang

Figure 1 for Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

Figure 2 for Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

Figure 3 for Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

Figure 4 for Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

Abstract:Chain-of-thought distillation is a powerful technique for transferring reasoning abilities from large language models (LLMs) to smaller student models. Previous methods typically require the student to mimic the step-by-step rationale produced by LLMs, often facing the following challenges: (i) Tokens within a rationale vary in significance, and treating them equally may fail to accurately mimic keypoint tokens, leading to reasoning errors. (ii) They usually distill knowledge by consistently predicting all the steps in a rationale, which falls short in distinguishing the learning order of step generation. This diverges from the human cognitive progression of starting with easy tasks and advancing to harder ones, resulting in sub-optimal outcomes. To this end, we propose a unified framework, called KPOD, to address these issues. Specifically, we propose a token weighting module utilizing mask learning to encourage accurate mimicry of keypoint tokens by the student during distillation. Besides, we develop an in-rationale progressive distillation strategy, starting with training the student to generate the final reasoning steps and gradually extending to cover the entire rationale. To accomplish this, a weighted token generation loss is proposed to assess step reasoning difficulty, and a value function is devised to schedule the progressive distillation by considering both step difficulty and question diversity. Extensive experiments on four reasoning benchmarks illustrate our KPOD outperforms previous methods by a large margin.

* Accepted by ICML 2024

Via

Access Paper or Ask Questions