Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hui Xue

KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

Jun 24, 2024

Dongyang Li, Taolin Zhang, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

Figure 1 for KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

Figure 2 for KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

Figure 3 for KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

Figure 4 for KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

Abstract:Knowledge-enhanced pre-trained language models (KEPLMs) leverage relation triples from knowledge graphs (KGs) and integrate these external data sources into language models via self-supervised learning. Previous works treat knowledge enhancement as two independent operations, i.e., knowledge injection and knowledge integration. In this paper, we propose to learn Knowledge-Enhanced language representations with Hierarchical Reinforcement Learning (KEHRL), which jointly addresses the problems of detecting positions for knowledge injection and integrating external knowledge into the model in order to avoid injecting inaccurate or irrelevant knowledge. Specifically, a high-level reinforcement learning (RL) agent utilizes both internal and prior knowledge to iteratively detect essential positions in texts for knowledge injection, which filters out less meaningful entities to avoid diverting the knowledge learning direction. Once the entity positions are selected, a relevant triple filtration module is triggered to perform low-level RL to dynamically refine the triples associated with polysemic entities through binary-valued actions. Experiments validate KEHRL's effectiveness in probing factual knowledge and enhancing the model's performance on various natural language understanding tasks.

Via

Access Paper or Ask Questions

On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Jun 24, 2024

Dongyang Li, Junbing Yan, Taolin Zhang, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

Figure 1 for On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Figure 2 for On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Figure 3 for On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Figure 4 for On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Abstract:Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the response quality of LLMs via enhancing queries indiscriminately with retrieved information, paying little attention to what type of knowledge LLMs really need to answer original queries more accurately. In this paper, we suggest that long-tail knowledge is crucial for RAG as LLMs have already remembered common world knowledge during large-scale pre-training. Based on our observation, we propose a simple but effective long-tail knowledge detection method for LLMs. Specifically, the novel Generative Expected Calibration Error (GECE) metric is derived to measure the ``long-tailness'' of knowledge based on both statistics and semantics. Hence, we retrieve relevant documents and infuse them into the model for patching knowledge loopholes only when the input query relates to long-tail knowledge. Experiments show that, compared to existing RAG pipelines, our method achieves over 4x speedup in average inference time and consistent performance improvement in downstream tasks.

Via

Access Paper or Ask Questions

Recurrent Inference Machine for Medical Image Registration

Jun 19, 2024

Yi Zhang, Yidong Zhao, Hui Xue, Peter Kellman, Stefan Klein, Qian Tao

Abstract:Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to traditional optimization-based registration methods, the speed advantage may come at the cost of registration performance at inference time. Besides, deep neural networks ideally demand large training datasets while optimization-based methods are training-free. To improve registration accuracy and data efficiency, we propose a novel image registration method, termed Recurrent Inference Image Registration (RIIR) network. RIIR is formulated as a meta-learning solver to the registration problem in an iterative manner. RIIR addresses the accuracy and data efficiency issues, by learning the update rule of optimization, with implicit regularization combined with explicit gradient input. We evaluated RIIR extensively on brain MRI and quantitative cardiac MRI datasets, in terms of both registration accuracy and training data efficiency. Our experiments showed that RIIR outperformed a range of deep learning-based methods, even with only $5\%$ of the training data, demonstrating high data efficiency. Key findings from our ablation studies highlighted the important added value of the hidden states introduced in the recurrent inference framework for meta-learning. Our proposed RIIR offers a highly data-efficient framework for deep learning-based medical image registration.

* Preprint

Via

Access Paper or Ask Questions

DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

May 31, 2024

Taolin Zhang, Qizhou Chen, Dongyang Li, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

Figure 1 for DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Figure 2 for DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Figure 3 for DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Figure 4 for DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models

Abstract:Recently, while large language models (LLMs) have demonstrated impressive results, they still suffer from hallucination, i.e., the generation of false information. Model editing is the task of fixing factual mistakes in LLMs; yet, most previous works treat it as a one-time task, paying little attention to ever-emerging mistakes generated by LLMs. We address the task of sequential model editing (SME) that aims to rectify mistakes continuously. A Dynamic Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic interaction among the factual knowledge within the entire sequence, preventing catastrophic forgetting during the editing process of multiple knowledge triples. Specifically, (1) for semantic fusion within a relation triple, we aggregate the intra-editing attention flow into auto-regressive self-attention with token-level granularity in LLMs. We further leverage multi-layer diagonal inter-editing attention flow to update the weighted representations of the entire sequence-level granularity. (2) Considering that auxiliary parameters are required to store the knowledge for sequential editing, we construct a new dataset named \textbf{DAFSet}, fulfilling recent, popular, long-tail and robust properties to enhance the generality of sequential editing. Experiments show DAFNet significantly outperforms strong baselines in single-turn and sequential editing. The usage of DAFSet also consistently improves the performance of other auxiliary network-based methods in various scenarios

* ACL2024 findings

Via

Access Paper or Ask Questions

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

May 28, 2024

Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao, Longtao Huang, Hui Xue, Wenhai Wang, Kui Ren, Jingyi Wang

Abstract:Large Language Models have gained considerable attention for their revolutionary capabilities. However, there is also growing concern on their safety implications, making a comprehensive safety evaluation for LLMs urgently needed before model deployment. In this work, we propose S-Eval, a new comprehensive, multi-dimensional and open-ended safety evaluation benchmark. At the core of S-Eval is a novel LLM-based automatic test prompt generation and selection framework, which trains an expert testing LLM Mt combined with a range of test selection strategies to automatically construct a high-quality test suite for the safety evaluation. The key to the automation of this process is a novel expert safety-critique LLM Mc able to quantify the riskiness score of an LLM's response, and additionally produce risk tags and explanations. Besides, the generation process is also guided by a carefully designed risk taxonomy with four different levels, covering comprehensive and multi-dimensional safety risks of concern. Based on these, we systematically construct a new and large-scale safety evaluation benchmark for LLMs consisting of 220,000 evaluation prompts, including 20,000 base risk prompts (10,000 in Chinese and 10,000 in English) and 200,000 corresponding attack prompts derived from 10 popular adversarial instruction attacks against LLMs. Moreover, considering the rapid evolution of LLMs and accompanied safety threats, S-Eval can be flexibly configured and adapted to include new risks, attacks and models. S-Eval is extensively evaluated on 20 popular and representative LLMs. The results confirm that S-Eval can better reflect and inform the safety risks of LLMs compared to existing benchmarks. We also explore the impacts of parameter scales, language environments, and decoding parameters on the evaluation, providing a systematic methodology for evaluating the safety of LLMs.

* 18 pages, 11 figures

Via

Access Paper or Ask Questions

Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

May 08, 2024

Qizhou Chen, Taolin Zhang, Xiaofeng He, Dongyang Li, Chengyu Wang, Longtao Huang, Hui Xue

Figure 1 for Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

Figure 2 for Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

Figure 3 for Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

Figure 4 for Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

Abstract:Model editing aims to correct outdated or erroneous knowledge in large language models (LLMs) without the need for costly retraining. Lifelong model editing is the most challenging task that caters to the continuous editing requirements of LLMs. Prior works primarily focus on single or batch editing; nevertheless, these methods fall short in lifelong editing scenarios due to catastrophic knowledge forgetting and the degradation of model performance. Although retrieval-based methods alleviate these issues, they are impeded by slow and cumbersome processes of integrating the retrieved knowledge into the model. In this work, we introduce RECIPE, a RetriEval-augmented ContInuous Prompt lEarning method, to boost editing efficacy and inference efficiency in lifelong learning. RECIPE first converts knowledge statements into short and informative continuous prompts, prefixed to the LLM's input query embedding, to efficiently refine the response grounded on the knowledge. It further integrates the Knowledge Sentinel (KS) that acts as an intermediary to calculate a dynamic threshold, determining whether the retrieval repository contains relevant knowledge. Our retriever and prompt encoder are jointly trained to achieve editing properties, i.e., reliability, generality, and locality. In our experiments, RECIPE is assessed extensively across multiple LLMs and editing datasets, where it achieves superior editing performance. RECIPE also demonstrates its capability to maintain the overall performance of LLMs alongside showcasing fast editing and inference speed.

* 14 pages, 4 figures, 6 tables

Via

Access Paper or Ask Questions

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

May 04, 2024

Taolin Zhang, Dongyang Li, Qizhou Chen, Chengyu Wang, Longtao Huang, Hui Xue, Xiaofeng He, Jun Huang

Abstract:Retrieval-augmented large language models (LLMs) leverage relevant content retrieved by information retrieval systems to generate correct responses, aiming to alleviate the hallucination problem. However, existing retriever-responder methods typically append relevant documents to the prompt of LLMs to perform text generation tasks without considering the interaction of fine-grained structural semantics between the retrieved documents and the LLMs. This issue is particularly important for accurate response generation as LLMs tend to ``lose in the middle'' when dealing with input prompts augmented with lengthy documents. In this work, we propose a new pipeline named ``Reinforced Retriever-Reorder-Responder'' (R$^4$) to learn document orderings for retrieval-augmented LLMs, thereby further enhancing their generation abilities while the large numbers of parameters of LLMs remain frozen. The reordering learning process is divided into two steps according to the quality of the generated responses: document order adjustment and document representation enhancement. Specifically, document order adjustment aims to organize retrieved document orderings into beginning, middle, and end positions based on graph attention learning, which maximizes the reinforced reward of response quality. Document representation enhancement further refines the representations of retrieved documents for responses of poor quality via document-level gradient adversarial learning. Extensive experiments demonstrate that our proposed pipeline achieves better factual question-answering performance on knowledge-intensive tasks compared to strong baselines across various public datasets. The source codes and trained models will be released upon paper acceptance.

Via

Access Paper or Ask Questions

Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

Apr 30, 2024

Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo Jin Lee, Michael Ohliger, Hui Xue, Yang Yang

Abstract:Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models with constrained data diversity and quantity, which exhibit limited generalizability in clinical applications across diverse MRI systems, pulse sequences, and organs. In this study, we present ImT-MRD: a complex-valued imaging transformer trained on a vast number of clinical MRI scans aiming at universal MR denoising at LF systems. Compared with averaging multiple-repeated scans for higher image SNR, the model obtains better image quality from fewer repetitions, demonstrating its capability for accelerating scans under various clinical settings. Moreover, with its complex-valued image input, the model can denoise intermediate results before advanced post-processing and prepare high-quality data for further MRI research. By delivering universal and accurate denoising across clinical and research tasks, our model holds great promise to expedite the evolution of LF MRI for accessible and equal biomedical applications.

Via

Access Paper or Ask Questions

Imaging transformer for MRI denoising with the SNR unit training: enabling generalization across field-strengths, imaging contrasts, and anatomy

Apr 03, 2024

Hui Xue, Sarah Hooper, Azaan Rehman, Iain Pierce, Thomas Treibel, Rhodri Davies, W Patricia Bandettini, Rajiv Ramasawmy, Ahsan Javed, Zheren Zhu(+4 more)

Figure 1 for Imaging transformer for MRI denoising with the SNR unit training: enabling generalization across field-strengths, imaging contrasts, and anatomy

Figure 2 for Imaging transformer for MRI denoising with the SNR unit training: enabling generalization across field-strengths, imaging contrasts, and anatomy

Abstract:The ability to recover MRI signal from noise is key to achieve fast acquisition, accurate quantification, and high image quality. Past work has shown convolutional neural networks can be used with abundant and paired low and high-SNR images for training. However, for applications where high-SNR data is difficult to produce at scale (e.g. with aggressive acceleration, high resolution, or low field strength), training a new denoising network using a large quantity of high-SNR images can be infeasible. In this study, we overcome this limitation by improving the generalization of denoising models, enabling application to many settings beyond what appears in the training data. Specifically, we a) develop a training scheme that uses complex MRIs reconstructed in the SNR units (i.e., the images have a fixed noise level, SNR unit training) and augments images with realistic noise based on coil g-factor, and b) develop a novel imaging transformer (imformer) to handle 2D, 2D+T, and 3D MRIs in one model architecture. Through empirical evaluation, we show this combination improves performance compared to CNN models and improves generalization, enabling a denoising model to be used across field-strengths, image contrasts, and anatomy.

Via

Access Paper or Ask Questions

Inline AI: Open-source Deep Learning Inference for Cardiac MR

Apr 03, 2024

Hui Xue, Rhodri H Davies, James Howard, Hunain Shiwani, Azaan Rehman, Iain Pierce, Henry Procter, Marianna Fontana, James C Moon, Eylem Levelt(+1 more)

Figure 1 for Inline AI: Open-source Deep Learning Inference for Cardiac MR

Figure 2 for Inline AI: Open-source Deep Learning Inference for Cardiac MR

Figure 3 for Inline AI: Open-source Deep Learning Inference for Cardiac MR

Figure 4 for Inline AI: Open-source Deep Learning Inference for Cardiac MR

Abstract:Cardiac Magnetic Resonance (CMR) is established as a non-invasive imaging technique for evaluation of heart function, anatomy, and myocardial tissue characterization. Quantitative biomarkers are central for diagnosis and management of heart disease. Deep learning (DL) is playing an ever more important role in extracting these quantitative measures from CMR images. While many researchers have reported promising results in training and evaluating models, model deployment into the imaging workflow is less explored. A new imaging AI framework, the InlineAI, was developed and open-sourced. The main innovation is to enable the model inference inline as a part of imaging computation, instead of as an offline post-processing step and to allow users to plug in their models. We demonstrate the system capability on three applications: long-axis CMR cine landmark detection, short-axis CMR cine analysis of function and anatomy, and quantitative perfusion mapping. The InlineAI allowed models to be deployed into imaging workflow in a streaming manner directly on the scanner. The model was loaded and inference on incoming images were performed while the data acquisition was ongoing, and results were sent back to scanner. Several biomarkers were extracted from model outputs in the demonstrated applications and reported as curves and tabular values. All processes are full automated. the model inference was completed within 6-45s after the end of imaging data acquisition.

Via

Access Paper or Ask Questions