Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Yu

PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Mar 30, 2024

Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang

Figure 1 for PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Figure 2 for PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Figure 3 for PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Figure 4 for PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Abstract:Large language models (LLMs) have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to sub-standard results in terms of readability and interpretability of the compressed prompt, with a detrimental impact on prompt utility. To address this, we propose PROMPT-SAW: Prompt compresSion via Relation AWare graphs, an effective strategy for prompt compression over task-agnostic and task-aware prompts. PROMPT-SAW uses the prompt's textual information to build a graph, later extracts key information elements in the graph to come up with the compressed prompt. We also propose GSM8K-AUG, i.e., an extended version of the existing GSM8k benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform. Experimental evaluation using benchmark datasets shows that prompts compressed by PROMPT-SAW are not only better in terms of readability, but they also outperform the best-performing baseline models by up to 14.3 and 13.7 respectively for task-aware and task-agnostic settings while compressing the original prompt text by 33.0 and 56.7.

Via

Access Paper or Ask Questions

Multi-hop Question Answering under Temporal Knowledge Editing

Mar 30, 2024

Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

Figure 1 for Multi-hop Question Answering under Temporal Knowledge Editing

Figure 2 for Multi-hop Question Answering under Temporal Knowledge Editing

Figure 3 for Multi-hop Question Answering under Temporal Knowledge Editing

Figure 4 for Multi-hop Question Answering under Temporal Knowledge Editing

Abstract:Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA). Unlike previous methods, TEMPLE-MQA first constructs a time-aware graph (TAG) to store edit knowledge in a structured manner. Then, through our proposed inference path, structural retrieval, and joint reasoning stages, TEMPLE-MQA effectively discerns temporal contexts within the question query. Experiments on benchmark datasets demonstrate that TEMPLE-MQA significantly outperforms baseline models. Additionally, we contribute a new dataset, namely TKEMQA, which serves as the inaugural benchmark tailored specifically for MQA with temporal scopes.

* 23 pages

Via

Access Paper or Ask Questions

Parallelized Midpoint Randomization for Langevin Monte Carlo

Feb 23, 2024

Lu Yu, Arnak Dalalyan

Abstract:We explore the sampling problem within the framework where parallel evaluations of the gradient of the log-density are feasible. Our investigation focuses on target distributions characterized by smooth and strongly log-concave densities. We revisit the parallelized randomized midpoint method and employ proof techniques recently developed for analyzing its purely sequential version. Leveraging these techniques, we derive upper bounds on the Wasserstein distance between the sampling and target densities. These bounds quantify the runtime improvement achieved by utilizing parallel processing units, which can be considerable.

* arXiv admin note: substantial text overlap with arXiv:2306.08494

Via

Access Paper or Ask Questions

Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima

Feb 17, 2024

Shu Yang, Lijie Hu, Lu Yu, Muhammad Asif Ali, Di Wang

Figure 1 for Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima

Figure 2 for Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima

Figure 3 for Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima

Figure 4 for Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima

Abstract:The increasing significance of large language and multimodal models in societal information processing has ignited debates on social safety and ethics. However, few studies have approached the analysis of these limitations from the comprehensive perspective of human and artificial intelligence system interactions. This study investigates biases and preferences when humans and large models are used as key links in communication. To achieve this, we design a multimodal dataset and three different experiments to evaluate generative models in their roles as producers and disseminators of information. Our main findings highlight that synthesized information is more likely to be incorporated into model training datasets and messaging than human-generated information. Additionally, large models, when acting as transmitters of information, tend to modify and lose specific content selectively. Conceptually, we present two realistic models of autophagic ("self-consumption") loops to account for the suppression of human-generated information in the exchange of information between humans and AI systems. We generalize the declining diversity of social information and the bottleneck in model performance caused by the above trends to the local optima of large models.

Via

Access Paper or Ask Questions

Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies

Feb 06, 2024

Zhixuan Chu, Yan Wang, Feng Zhu, Lu Yu, Longfei Li, Jinjie Gu

Abstract:The advent of large language models (LLMs) such as ChatGPT, PaLM, and GPT-4 has catalyzed remarkable advances in natural language processing, demonstrating human-like language fluency and reasoning capacities. This position paper introduces the concept of Professional Agents (PAgents), an application framework harnessing LLM capabilities to create autonomous agents with controllable, specialized, interactive, and professional-level competencies. We posit that PAgents can reshape professional services through continuously developed expertise. Our proposed PAgents framework entails a tri-layered architecture for genesis, evolution, and synergy: a base tool layer, a middle agent layer, and a top synergy layer. This paper aims to spur discourse on promising real-world applications of LLMs. We argue the increasing sophistication and integration of PAgents could lead to AI systems exhibiting professional mastery over complex domains, serving critical needs, and potentially achieving artificial general intelligence.

* 14 pages, 1 figure

Via

Access Paper or Ask Questions

Hierarchical Prompts for Rehearsal-free Continual Learning

Jan 21, 2024

Yukun Zuo, Hantao Yao, Lu Yu, Liansheng Zhuang, Changsheng Xu

Abstract:Continual learning endeavors to equip the model with the capability to integrate current task knowledge while mitigating the forgetting of past task knowledge. Inspired by prompt tuning, prompt-based methods maintain a frozen backbone and train with slight learnable prompts to minimize the catastrophic forgetting that arises due to updating a large number of backbone parameters. Nonetheless, these learnable prompts tend to concentrate on the discriminatory knowledge of the current task while ignoring past task knowledge, leading to that learnable prompts still suffering from catastrophic forgetting. This paper introduces a novel rehearsal-free paradigm for continual learning termed Hierarchical Prompts (H-Prompts), comprising three categories of prompts -- class prompt, task prompt, and general prompt. To effectively depict the knowledge of past classes, class prompt leverages Bayesian Distribution Alignment to model the distribution of classes in each task. To reduce the forgetting of past task knowledge, task prompt employs Cross-task Knowledge Excavation to amalgamate the knowledge encapsulated in the learned class prompts of past tasks and current task knowledge. Furthermore, general prompt utilizes Generalized Knowledge Exploration to deduce highly generalized knowledge in a self-supervised manner. Evaluations on two benchmarks substantiate the efficacy of the proposed H-Prompts, exemplified by an average accuracy of 87.8% in Split CIFAR-100 and 70.6% in Split ImageNet-R.

* Submitted to TPAMI

Via

Access Paper or Ask Questions

Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class Incremental Learning

Dec 20, 2023

Jiang-Tian Zhai, Xialei Liu, Lu Yu, Ming-Ming Cheng

Abstract:Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past. This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. The conventional knowledge distillation-based methods place too strict constraints on the network parameters and features to prevent forgetting, which limits the training of new tasks. To loose this constraint, we proposed a novel fine-grained selective patch-level distillation to adaptively balance plasticity and stability. Some task-agnostic patches can be used to preserve the decision boundary of the old task. While some patches containing the important foreground are favorable for learning the new task. Moreover, we employ a task-agnostic mechanism to generate more realistic prototypes of old tasks with the current task sample for reducing classifier bias for fine-grained knowledge restoration. Extensive experiments on CIFAR100, TinyImageNet and ImageNet-Subset demonstrate the effectiveness of our method. Code is available at https://github.com/scok30/vit-cil.

* to appear at AAAI 2024

Via

Access Paper or Ask Questions

VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis

Sep 09, 2023

Xinya Chen, Jiaxin Huang, Yanrui Bin, Lu Yu, Yiyi Liao

Abstract:Unsupervised learning of 3D-aware generative adversarial networks has lately made much progress. Some recent work demonstrates promising results of learning human generative models using neural articulated radiance fields, yet their generalization ability and controllability lag behind parametric human models, i.e., they do not perform well when generalizing to novel pose/shape and are not part controllable. To solve these problems, we propose VeRi3D, a generative human vertex-based radiance field parameterized by vertices of the parametric human template, SMPL. We map each 3D point to the local coordinate system defined on its neighboring vertices, and use the corresponding vertex feature and local coordinates for mapping it to color and density values. We demonstrate that our simple approach allows for generating photorealistic human images with free control over camera pose, human pose, shape, as well as enabling part-level editing.

Via

Access Paper or Ask Questions

Langevin Monte Carlo for strongly log-concave distributions: Randomized midpoint revisited

Jun 16, 2023

Lu Yu, Avetik Karagulyan, Arnak Dalalyan

Figure 1 for Langevin Monte Carlo for strongly log-concave distributions: Randomized midpoint revisited

Abstract:We revisit the problem of sampling from a target distribution that has a smooth strongly log-concave density everywhere in $\mathbb R^p$. In this context, if no additional density information is available, the randomized midpoint discretization for the kinetic Langevin diffusion is known to be the most scalable method in high dimensions with large condition numbers. Our main result is a nonasymptotic and easy to compute upper bound on the Wasserstein-2 error of this method. To provide a more thorough explanation of our method for establishing the computable upper bound, we conduct an analysis of the midpoint discretization for the vanilla Langevin process. This analysis helps to clarify the underlying principles and provides valuable insights that we use to establish an improved upper bound for the kinetic Langevin process with the midpoint discretization. Furthermore, by applying these techniques we establish new guarantees for the kinetic Langevin process with Euler discretization, which have a better dependence on the condition number than existing upper bounds.

Via

Access Paper or Ask Questions

Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

Jun 15, 2023

Lu Yu, Wei Xiang, Kang Han

Abstract:Recent research has demonstrated that the combination of pretrained diffusion models with neural radiance fields (NeRFs) has emerged as a promising approach for text-to-3D generation. Simply coupling NeRF with diffusion models will result in cross-view inconsistency and degradation of stylized view syntheses. To address this challenge, we propose the Edit-DiffNeRF framework, which is composed of a frozen diffusion model, a proposed delta module to edit the latent semantic space of the diffusion model, and a NeRF. Instead of training the entire diffusion for each scene, our method focuses on editing the latent semantic space in frozen pretrained diffusion models by the delta module. This fundamental change to the standard diffusion framework enables us to make fine-grained modifications to the rendered views and effectively consolidate these instructions in a 3D scene via NeRF training. As a result, we are able to produce an edited 3D scene that faithfully aligns to input text instructions. Furthermore, to ensure semantic consistency across different viewpoints, we propose a novel multi-view semantic consistency loss that extracts a latent semantic embedding from the input view as a prior, and aim to reconstruct it in different views. Our proposed method has been shown to effectively edit real-world 3D scenes, resulting in 25% improvement in the alignment of the performed 3D edits with text instructions compared to prior work.

Via

Access Paper or Ask Questions