Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yicheng Zou

The Rise and Potential of Large Language Model Based Agents: A Survey

Sep 19, 2023

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou(+19 more)

Figure 1 for The Rise and Potential of Large Language Model Based Agents: A Survey

Figure 2 for The Rise and Potential of Large Language Model Based Agents: A Survey

Figure 3 for The Rise and Potential of Large Language Model Based Agents: A Survey

Figure 4 for The Rise and Potential of Large Language Model Based Agents: A Survey

Abstract:For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.

* 86 pages, 12 figures

Via

Access Paper or Ask Questions

Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Jan 03, 2023

Xiaoyang Chen, Jinjian Wu, Wenjiao Lyu, Yicheng Zou, Kim-Han Thung, Siyuan Liu, Ye Wu, Sahar Ahmad, Pew-Thian Yap

Figure 1 for Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Figure 2 for Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Figure 3 for Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Figure 4 for Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Abstract:Automatic segmentation of brain MR images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is critical for tissue volumetric analysis and cortical surface reconstruction. Due to dramatic structural and appearance changes associated with developmental and aging processes, existing brain tissue segmentation methods are only viable for specific age groups. Consequently, methods developed for one age group may fail for another. In this paper, we make the first attempt to segment brain tissues across the entire human lifespan (0-100 years of age) using a unified deep learning model. To overcome the challenges related to structural variability underpinned by biological processes, intensity inhomogeneity, motion artifacts, scanner-induced differences, and acquisition protocols, we propose to use contrastive learning to improve the quality of feature representations in a latent space for effective lifespan tissue segmentation. We compared our approach with commonly used segmentation methods on a large-scale dataset of 2,464 MR images. Experimental results show that our model accurately segments brain tissues across the lifespan and outperforms existing methods.

Via

Access Paper or Ask Questions

Towards Understanding Omission in Dialogue Summarization

Nov 14, 2022

Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Tao Gui, Qi Zhang, Dongsheng Li

Figure 1 for Towards Understanding Omission in Dialogue Summarization

Figure 2 for Towards Understanding Omission in Dialogue Summarization

Figure 3 for Towards Understanding Omission in Dialogue Summarization

Figure 4 for Towards Understanding Omission in Dialogue Summarization

Abstract:Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.

Via

Access Paper or Ask Questions

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Oct 16, 2022

Baijun Ji, Tong Zhang, Yicheng Zou, Bojie Hu, Si Shen

Figure 1 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 2 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 3 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 4 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Abstract:Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.

* 10 pages, 4 figures; EMNLP main conference

Via

Access Paper or Ask Questions

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Jun 05, 2022

Kaitao Song, Yichong Leng, Xu Tan, Yicheng Zou, Tao Qin, Dongsheng Li

Figure 1 for Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Figure 2 for Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Figure 3 for Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Figure 4 for Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Abstract:Sentence scoring aims at measuring the likelihood score of a sentence and is widely used in many natural language processing scenarios, like reranking, which is to select the best sentence from multiple candidates. Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost. In this paper, we propose \textit{Transcormer} -- a Transformer model with a novel \textit{sliding language modeling} (SLM) for sentence scoring. Specifically, our SLM adopts a triple-stream self-attention mechanism to estimate the probability of all tokens in a sentence with bidirectional context and only requires a single forward pass. SLM can avoid the limitations of CLM (only unidirectional context) and MLM (multiple forward passes) and inherit their advantages, and thus achieve high effectiveness and efficiency in scoring. Experimental results on multiple tasks demonstrate that our method achieves better performance than other language modelings.

Via

Access Paper or Ask Questions

MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

Apr 09, 2022

Xiao Wang, Shihan Dou, Limao Xiong, Yicheng Zou, Qi Zhang, Tao Gui, Liang Qiao, Zhanzhan Cheng, Xuanjing Huang

Figure 1 for MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

Figure 2 for MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

Figure 3 for MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

Figure 4 for MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

Abstract:NER model has achieved promising performance on standard NER benchmarks. However, recent studies show that previous approaches may over-rely on entity mention information, resulting in poor performance on out-of-vocabulary (OOV) entity recognition. In this work, we propose MINER, a novel NER learning framework, to remedy this issue from an information-theoretic perspective. The proposed approach contains two mutual information-based training objectives: i) generalizing information maximization, which enhances representation via deep understanding of context and entity surface forms; ii) superfluous information minimization, which discourages representation from rote memorizing entity names or exploiting biased cues in data. Experiments on various settings and datasets demonstrate that it achieves better performance in predicting OOV entities.

Via

Access Paper or Ask Questions

Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Mar 06, 2022

Yicheng Zou, Hongwei Liu, Tao Gui, Junzhe Wang, Qi Zhang, Meng Tang, Haixiang Li, Daniel Wang

Figure 1 for Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Figure 2 for Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Figure 3 for Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Figure 4 for Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

Abstract:Text semantic matching is a fundamental task that has been widely used in various scenarios, such as community question answering, information retrieval, and recommendation. Most state-of-the-art matching models, e.g., BERT, directly perform text comparison by processing each word uniformly. However, a query sentence generally comprises content that calls for different levels of matching granularity. Specifically, keywords represent factual information such as action, entity, and event that should be strictly matched, while intents convey abstract concepts and ideas that can be paraphrased into various expressions. In this work, we propose a simple yet effective training strategy for text semantic matching in a divide-and-conquer manner by disentangling keywords from intents. Our approach can be easily combined with pre-trained language models (PLM) without influencing their inference efficiency, achieving stable performance improvements against a wide range of PLMs on three benchmarks.

* Accepted by Findings of ACL 2022, 11 pages

Via

Access Paper or Ask Questions

Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Nov 03, 2021

Zhengyan Li, Yicheng Zou, Chong Zhang, Qi Zhang, Zhongyu Wei

Figure 1 for Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Figure 2 for Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Figure 3 for Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Figure 4 for Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Abstract:Aspect-based sentiment analysis aims to identify the sentiment polarity of a specific aspect in product reviews. We notice that about 30% of reviews do not contain obvious opinion words, but still convey clear human-aware sentiment orientation, which is known as implicit sentiment. However, recent neural network-based approaches paid little attention to implicit sentiment entailed in the reviews. To overcome this issue, we adopt Supervised Contrastive Pre-training on large-scale sentiment-annotated corpora retrieved from in-domain language resources. By aligning the representation of implicit sentiment expressions to those with the same sentiment label, the pre-training process leads to better capture of both implicit and explicit sentiment orientation towards aspects in reviews. Experimental results show that our method achieves state-of-the-art performance on SemEval2014 benchmarks, and comprehensive analysis validates its effectiveness on learning implicit sentiment.

* Accepted as a long paper in the main conference of EMNLP 2021

Via

Access Paper or Ask Questions

Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Sep 11, 2021

Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, Qi Zhang

Figure 1 for Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Figure 2 for Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Figure 3 for Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Figure 4 for Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining

Abstract:With the rapid increase in the volume of dialogue data from daily life, there is a growing demand for dialogue summarization. Unfortunately, training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries. Most existing works for low-resource dialogue summarization directly pretrain models in other domains, e.g., the news domain, but they generally neglect the huge difference between dialogues and conventional articles. To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data. Specifically, we exploit large-scale in-domain non-summary data to separately pretrain the dialogue encoder and the summary decoder. The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization. The experimental results on two public datasets show that with only limited training data, our approach achieves competitive performance and generalizes well in different dialogue scenarios.

* Accepted by EMNLP 2021, 12 pages

Via

Access Paper or Ask Questions

Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Sep 09, 2021

Yicheng Zou, Zhihua Liu, Xingwu Hu, Qi Zhang

Figure 1 for Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Figure 2 for Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Figure 3 for Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Figure 4 for Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems

Abstract:Human dialogue contains evolving concepts, and speakers naturally associate multiple concepts to compose a response. However, current dialogue models with the seq2seq framework lack the ability to effectively manage concept transitions and can hardly introduce multiple concepts to responses in a sequential decoding manner. To facilitate a controllable and coherent dialogue, in this work, we devise a concept-guided non-autoregressive model (CG-nAR) for open-domain dialogue generation. The proposed model comprises a multi-concept planning module that learns to identify multiple associated concepts from a concept graph and a customized Insertion Transformer that performs concept-guided non-autoregressive generation to complete a response. The experimental results on two public datasets show that CG-nAR can produce diverse and coherent responses, outperforming state-of-the-art baselines in both automatic and human evaluations with substantially faster inference speed.

* Accepted by EMNLP 2021, 12 pages

Via

Access Paper or Ask Questions