Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Le Sun

The Life Cycle of Knowledge in Big Language Models: A Survey

Mar 14, 2023
Boxi Cao, Hongyu Lin, Xianpei Han, Le Sun

Figure 1 for The Life Cycle of Knowledge in Big Language Models: A Survey

Figure 2 for The Life Cycle of Knowledge in Big Language Models: A Survey

Figure 3 for The Life Cycle of Knowledge in Big Language Models: A Survey

Figure 4 for The Life Cycle of Knowledge in Big Language Models: A Survey

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.

* paperlist: https://github.com/c-box/KnowledgeLifecycle

Via

Access Paper or Ask Questions

Semantic-aware Contrastive Learning for More Accurate Semantic Parsing

Jan 19, 2023
Shan Wu, Chunlei Xin, Bo Chen, Xianpei Han, Le Sun

Figure 1 for Semantic-aware Contrastive Learning for More Accurate Semantic Parsing

Figure 2 for Semantic-aware Contrastive Learning for More Accurate Semantic Parsing

Figure 3 for Semantic-aware Contrastive Learning for More Accurate Semantic Parsing

Figure 4 for Semantic-aware Contrastive Learning for More Accurate Semantic Parsing

Since the meaning representations are detailed and accurate annotations which express fine-grained sequence-level semtantics, it is usually hard to train discriminative semantic parsers via Maximum Likelihood Estimation (MLE) in an autoregressive fashion. In this paper, we propose a semantic-aware contrastive learning algorithm, which can learn to distinguish fine-grained meaning representations and take the overall sequence-level semantic into consideration. Specifically, a multi-level online sampling algorithm is proposed to sample confusing and diverse instances. Three semantic-aware similarity functions are designed to accurately measure the distance between meaning representations as a whole. And a ranked contrastive loss is proposed to pull the representations of the semantic-identical instances together and push negative instances away. Experiments on two standard datasets show that our approach achieves significant improvements over MLE baselines and gets state-of-the-art performances by simply applying semantic-aware contrastive learning on a vanilla Seq2Seq model.

* Accepted by EMNLP 2022

Via

Access Paper or Ask Questions

Universal Information Extraction as Unified Semantic Matching

Jan 09, 2023
Jie Lou, Yaojie Lu, Dai Dai, Wei Jia, Hongyu Lin, Xianpei Han, Le Sun, Hua Wu

Figure 1 for Universal Information Extraction as Unified Semantic Matching

Figure 2 for Universal Information Extraction as Unified Semantic Matching

Figure 3 for Universal Information Extraction as Unified Semantic Matching

Figure 4 for Universal Information Extraction as Unified Semantic Matching

The challenge of information extraction (IE) lies in the diversity of label schemas and the heterogeneity of structures. Traditional methods require task-specific model design and rely heavily on expensive supervision, making them difficult to generalize to new schemas. In this paper, we decouple IE into two basic abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching (USM) framework, which introduces three unified token linking operations to model the abilities of structuring and conceptualizing. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand. Empirical evaluation on 4 IE tasks shows that the proposed method achieves state-of-the-art performance under the supervised experiments and shows strong generalization ability in zero/few-shot transfer settings.

* accepted by AAAI2023

Via

Access Paper or Ask Questions

Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

May 12, 2022
Tianshu Wang, Hongyu Lin, Cheng Fu, Xianpei Han, Le Sun, Feiyu Xiong, Hui Chen, Minlong Lu, Xiuwen Zhu

Figure 1 for Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Figure 2 for Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Figure 3 for Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Figure 4 for Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learningbased methods achieve very impressive performance on standard EM benchmarks, their realworld application performance is much frustrating. In this paper, we highlight that such the gap between reality and ideality stems from the unreasonable benchmark construction process, which is inconsistent with the nature of entity matching and therefore leads to biased evaluations of current EM approaches. To this end, we build a new EM corpus and re-construct EM benchmarks to challenge critical assumptions implicit in the previous benchmark construction process by step-wisely changing the restricted entities, balanced labels, and single-modal records in previous benchmarks into open entities, imbalanced labels, and multimodal records in an open environment. Experimental results demonstrate that the assumptions made in the previous benchmark construction process are not coincidental with the open environment, which conceal the main challenges of the task and therefore significantly overestimate the current progress of entity matching. The constructed benchmarks and code are publicly released

* Accepted to IJCAI2022

Via

Access Paper or Ask Questions

Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

May 09, 2022
Ying Zhou, Xuanang Chen, Ben He, Zheng Ye, Le Sun

Figure 1 for Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

Figure 2 for Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

Figure 3 for Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

Figure 4 for Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective

Knowledge graph completion (KGC) aims to infer missing knowledge triples based on known facts in a knowledge graph. Current KGC research mostly follows an entity ranking protocol, wherein the effectiveness is measured by the predicted rank of a masked entity in a test triple. The overall performance is then given by a micro(-average) metric over all individual answer entities. Due to the incomplete nature of the large-scale knowledge bases, such an entity ranking setting is likely affected by unlabelled top-ranked positive examples, raising questions on whether the current evaluation protocol is sufficient to guarantee a fair comparison of KGC systems. To this end, this paper presents a systematic study on whether and how the label sparsity affects the current KGC evaluation with the popular micro metrics. Specifically, inspired by the TREC paradigm for large-scale information retrieval (IR) experimentation, we create a relatively "complete" judgment set based on a sample from the popular FB15k-237 dataset following the TREC pooling method. According to our analysis, it comes as a surprise that switching from the original labels to our "complete" labels results in a drastic change of system ranking of a variety of 13 popular KGC models in terms of micro metrics. Further investigation indicates that the IR-like macro(-average) metrics are more stable and discriminative under different settings, meanwhile, less affected by label sparsity. Thus, for KGC evaluation, we recommend conducting TREC-style pooling to balance between human efforts and label completeness, and reporting also the IR-like macro metrics to reflect the ranking nature of the KGC task.

* Accepted by SIGIR 2022, full paper

Via

Access Paper or Ask Questions

Groupwise Query Performance Prediction with BERT

Apr 25, 2022
Xiaoyang Chen, Ben He, Le Sun

Figure 1 for Groupwise Query Performance Prediction with BERT

Figure 2 for Groupwise Query Performance Prediction with BERT

Figure 3 for Groupwise Query Performance Prediction with BERT

While large-scale pre-trained language models like BERT have advanced the state-of-the-art in IR, its application in query performance prediction (QPP) is so far based on pointwise modeling of individual queries. Meanwhile, recent studies suggest that the cross-attention modeling of a group of documents can effectively boost performances for both learning-to-rank algorithms and BERT-based re-ranking. To this end, a BERT-based groupwise QPP model is proposed, in which the ranking contexts of a list of queries are jointly modeled to predict the relative performance of individual queries. Extensive experiments on three standard TREC collections showcase effectiveness of our approach. Our code is available at https://github.com/VerdureChen/Group-QPP.

* Accepted at Proceedings of the 44th European Conference on Information Retrieval, ECIR 2022

Via

Access Paper or Ask Questions

Unified Structure Generation for Universal Information Extraction

Mar 23, 2022
Yaojie Lu, Qing Liu, Dai Dai, Xinyan Xiao, Hongyu Lin, Xianpei Han, Le Sun, Hua Wu

Figure 1 for Unified Structure Generation for Universal Information Extraction

Figure 2 for Unified Structure Generation for Universal Information Extraction

Figure 3 for Unified Structure Generation for Universal Information Extraction

Figure 4 for Unified Structure Generation for Universal Information Extraction

Information extraction suffers from its varying targets, heterogeneous structures, and demand-specific schemas. In this paper, we propose a unified text-to-structure generation framework, namely UIE, which can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources. Specifically, UIE uniformly encodes different extraction structures via a structured extraction language, adaptively generates target extractions via a schema-based prompt mechanism - structural schema instructor, and captures the common IE abilities via a large-scale pre-trained text-to-structure model. Experiments show that UIE achieved the state-of-the-art performance on 4 IE tasks, 13 datasets, and on all supervised, low-resource, and few-shot settings for a wide range of entity, relation, event and sentiment extraction tasks and their unification. These results verified the effectiveness, universality, and transferability of UIE.

* Accepted to the main conference of ACL2022

Via

Access Paper or Ask Questions

Pre-training to Match for Unified Low-shot Relation Extraction

Mar 23, 2022
Fangchao Liu, Hongyu Lin, Xianpei Han, Boxi Cao, Le Sun

Figure 1 for Pre-training to Match for Unified Low-shot Relation Extraction

Figure 2 for Pre-training to Match for Unified Low-shot Relation Extraction

Figure 3 for Pre-training to Match for Unified Low-shot Relation Extraction

Figure 4 for Pre-training to Match for Unified Low-shot Relation Extraction

Low-shot relation extraction~(RE) aims to recognize novel relations with very few or even no samples, which is critical in real scenario application. Few-shot and zero-shot RE are two representative low-shot RE tasks, which seem to be with similar target but require totally different underlying abilities. In this paper, we propose Multi-Choice Matching Networks to unify low-shot relation extraction. To fill in the gap between zero-shot and few-shot RE, we propose the triplet-paraphrase meta-training, which leverages triplet paraphrase to pre-train zero-shot label matching ability and uses meta-learning paradigm to learn few-shot instance summarizing ability. Experimental results on three different low-shot RE tasks show that the proposed method outperforms strong baselines by a large margin, and achieve the best performance on few-shot RE leaderboard.

* Accepted to the main conference of ACL2022

Via

Access Paper or Ask Questions

ECO v1: Towards Event-Centric Opinion Mining

Mar 23, 2022
Ruoxi Xu, Hongyu Lin, Meng Liao, Xianpei Han, Jin Xu, Wei Tan, Yingfei Sun, Le Sun

Figure 1 for ECO v1: Towards Event-Centric Opinion Mining

Figure 2 for ECO v1: Towards Event-Centric Opinion Mining

Figure 3 for ECO v1: Towards Event-Centric Opinion Mining

Figure 4 for ECO v1: Towards Event-Centric Opinion Mining

Events are considered as the fundamental building blocks of the world. Mining event-centric opinions can benefit decision making, people communication, and social good. Unfortunately, there is little literature addressing event-centric opinion mining, although which significantly diverges from the well-studied entity-centric opinion mining in connotation, structure, and expression. In this paper, we propose and formulate the task of event-centric opinion mining based on event-argument structure and expression categorizing theory. We also benchmark this task by constructing a pioneer corpus and designing a two-step benchmark framework. Experiment results show that event-centric opinion mining is feasible and challenging, and the proposed task, dataset, and baselines are beneficial for future studies.

* Accepted to Findings of ACL2022

Via

Access Paper or Ask Questions

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Mar 23, 2022
Boxi Cao, Hongyu Lin, Xianpei Han, Fangchao Liu, Le Sun

Figure 1 for Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Figure 2 for Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Figure 3 for Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Figure 4 for Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in real-world applications. To discover, understand and quantify the risks, this paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention. This paper provides valuable insights for the design of unbiased datasets, better probing frameworks and more reliable evaluations of pretrained language models. Furthermore, our conclusions also echo that we need to rethink the criteria for identifying better pretrained language models. We openly released the source code and data at https://github.com/c-box/causalEval.

* Accepted to the main conference of ACL2022

Via

Access Paper or Ask Questions