Alert button
Picture for Lei Li

Lei Li

Alert button

Jenny

Extrapolating Multilingual Understanding Models as Multilingual Generators

May 22, 2023
Bohong Wu, Fei Yuan, Hai Zhao, Lei Li, Jingjing Xu

Figure 1 for Extrapolating Multilingual Understanding Models as Multilingual Generators
Figure 2 for Extrapolating Multilingual Understanding Models as Multilingual Generators
Figure 3 for Extrapolating Multilingual Understanding Models as Multilingual Generators
Figure 4 for Extrapolating Multilingual Understanding Models as Multilingual Generators

Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these non-autoregressive (NAR) models still struggle to generate high-quality texts compared with autoregressive (AR) models. Considering that encoder-based models have the advantage of efficient generation and self-correction abilities, this paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. Specifically, we start from a multilingual encoder (XLM-R) and propose a \textbf{S}emantic-\textbf{G}uided \textbf{A}lignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-R$_{large}$. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators.

Viaarxiv icon

Can We Edit Factual Knowledge by In-Context Learning?

May 22, 2023
Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang

Figure 1 for Can We Edit Factual Knowledge by In-Context Learning?
Figure 2 for Can We Edit Factual Knowledge by In-Context Learning?
Figure 3 for Can We Edit Factual Knowledge by In-Context Learning?
Figure 4 for Can We Edit Factual Knowledge by In-Context Learning?

Previous studies have shown that large language models (LLMs) like GPTs store massive factual knowledge in their parameters. However, the stored knowledge could be false or out-dated. Traditional knowledge editing methods refine LLMs via fine-tuning on texts containing specific knowledge. However, with the increasing scales of LLMs, these gradient-based approaches bring large computation costs. The trend of model-as-a-service also makes it impossible to modify knowledge in black-box LMs. Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge. To answer this question, we give a comprehensive empirical study of ICL strategies. Experiments show that in-context knowledge editing (IKE), without any gradient and parameter updating, achieves a competitive success rate compared to gradient-based methods on GPT-J (6B) but with much fewer side effects, including less over-editing on similar but unrelated facts and less knowledge forgetting on previously stored knowledge. We also apply the method to larger LMs with tens or hundreds of parameters like OPT-175B, which shows the scalability of our method. The code is available at https://github.com/Zce1112zslx/IKE.

Viaarxiv icon

Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

May 21, 2023
Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, Xu Sun

Figure 1 for Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Figure 2 for Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Figure 3 for Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Figure 4 for Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. This significantly reduces the cost of corpus collection and preserves data privacy. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. In this paper, we propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients. Since different language pairs exhibit substantial discrepancies in data distributions, adapter parameters of clients may conflict with each other. To tackle this, we explore various clustering strategies to group parameters for integration and mitigate the negative effects of conflicting parameters. Experimental results demonstrate that our framework reduces communication cost by over 98% while achieving similar or even better performance compared to competitive baselines. Further analysis reveals that clustering strategies effectively solve the problem of linguistic discrepancy and pruning adapter modules further improves communication efficiency.

* Findings of ACL 2023 
Viaarxiv icon

Statistical Knowledge Assessment for Generative Language Models

May 17, 2023
Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Zhifang Sui, Lei Li

Figure 1 for Statistical Knowledge Assessment for Generative Language Models
Figure 2 for Statistical Knowledge Assessment for Generative Language Models
Figure 3 for Statistical Knowledge Assessment for Generative Language Models
Figure 4 for Statistical Knowledge Assessment for Generative Language Models

Generative Language Models (GLMs) have demonstrated capabilities to store factual knowledge and answer queries efficiently. Given varying prompts, does a GLM consistently generate factually correct answers? In this paper, we introduce a statistical knowledge assessment framework guided by latent variables and the KaRR metric, which quantifies a model's knowledge by computing its continuous probability across diverse text forms. We conduct a comprehensive comparison of knowledge across 14 GLMs using our framework, including LLaMA, Alpaca, OPT, and others. Our statistical knowledge assessment encompasses 600 relation types and exhibits a strong correlation (0.43 Kendall's $\tau$) with human evaluation. Our findings reveal that the knowledge in GLMs with the same backbone architecture adheres to the scaling law, and that tuning on instruction-following data may compromise the model's ability to generate factually correct text consistently.

Viaarxiv icon

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

May 13, 2023
Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, Yanghua Xiao

Figure 1 for Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Figure 2 for Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Figure 3 for Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Figure 4 for Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.

* Accepted to ACL 2023 
Viaarxiv icon

HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction

May 07, 2023
Yijun Wang, Changzhi Sun, Yuanbin Wu, Lei Li, Junchi Yan, Hao Zhou

Figure 1 for HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction
Figure 2 for HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction
Figure 3 for HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction
Figure 4 for HIORE: Leveraging High-order Interactions for Unified Entity Relation Extraction

Entity relation extraction consists of two sub-tasks: entity recognition and relation extraction. Existing methods either tackle these two tasks separately or unify them with word-by-word interactions. In this paper, we propose HIORE, a new method for unified entity relation extraction. The key insight is to leverage the high-order interactions, i.e., the complex association among word pairs, which contains richer information than the first-order word-by-word interactions. For this purpose, we first devise a W-shape DNN (WNet) to capture coarse-level high-order connections. Then, we build a heuristic high-order graph and further calibrate the representations with a graph neural network (GNN). Experiments on three benchmarks (ACE04, ACE05, SciERC) show that HIORE achieves the state-of-the-art performance on relation extraction and an improvement of 1.1~1.8 F1 points over the prior best unified model.

* 10 pages 
Viaarxiv icon

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

May 02, 2023
Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

Figure 1 for Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Figure 2 for Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Figure 3 for Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Figure 4 for Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT). In this paper, we systematically investigate the advantages and challenges of LLMs for MMT by answering two questions: 1) How well do LLMs perform in translating a massive number of languages? 2) Which factors affect LLMs' performance in translation? We evaluate popular LLMs, including XGLM, OPT, BLOOMZ, and ChatGPT, on 102 languages. Our empirical results show that even the best model ChatGPT still lags behind the supervised baseline NLLB in 83.33% of translation directions. Through further analysis, we discover that LLMs exhibit new working patterns when used for MMT. First, prompt semantics can surprisingly be ignored when given in-context exemplars, where LLMs still show strong performance even with unreasonable prompts. Second, cross-lingual exemplars can provide better task instruction for low-resource translation than exemplars in the same language pairs. Third, we observe the overestimated performance of BLOOMZ on dataset Flores-101, indicating the potential risk when using public datasets for evaluation.

Viaarxiv icon

Importance Weighted Expectation-Maximization for Protein Sequence Design

Apr 30, 2023
Zhenqiao Song, Lei Li

Figure 1 for Importance Weighted Expectation-Maximization for Protein Sequence Design
Figure 2 for Importance Weighted Expectation-Maximization for Protein Sequence Design
Figure 3 for Importance Weighted Expectation-Maximization for Protein Sequence Design
Figure 4 for Importance Weighted Expectation-Maximization for Protein Sequence Design

Designing protein sequences with desired biological function is crucial in biology and chemistry. Recent machine learning methods use a surrogate sequence-function model to replace the expensive wet-lab validation. How can we efficiently generate diverse and novel protein sequences with high fitness? In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion. At its core, IsEM-Pro is a latent generative model, augmented by combinatorial structure features from a separately learned Markov random fields (MRFs). We develop an Monte Carlo Expectation-Maximization method (MCEM) to learn the model. During inference, sampling from its latent space enhances diversity while its MRFs features guide the exploration in high fitness regions. Experiments on eight protein sequence design tasks show that our IsEM-Pro outperforms the previous best methods by at least 55% on average fitness score and generates more diverse and novel protein sequences.

Viaarxiv icon

Influence of Myocardial Infarction on QRS Properties: A Simulation Study

Apr 21, 2023
Lei Li, Julia Camps, Zhinuo, Wang, Abhirup Banerjee, Blanca Rodriguez, Vicente Grau

Figure 1 for Influence of Myocardial Infarction on QRS Properties: A Simulation Study
Figure 2 for Influence of Myocardial Infarction on QRS Properties: A Simulation Study
Figure 3 for Influence of Myocardial Infarction on QRS Properties: A Simulation Study
Figure 4 for Influence of Myocardial Infarction on QRS Properties: A Simulation Study

The interplay between structural and electrical changes in the heart after myocardial infarction (MI) plays a key role in the initiation and maintenance of arrhythmia. The anatomical and electrophysiological properties of scar, border zone, and normal myocardium modify the electrocardiographic morphology, which is routinely analysed in clinical settings. However, the influence of various MI properties on the QRS is not intuitively predictable.In this work, we have systematically investigated the effects of 17 post-MI scenarios, varying the location, size, transmural extent, and conductive level of scarring and border zone area, on the forward-calculated QRS. Additionally, we have compared the contributions of different QRS score criteria for quantifying post-MI pathophysiology.The propagation of electrical activity in the ventricles is simulated via a Eikonal model on a unified coordinate system.The analysis has been performed on 49 subjects, and the results imply that the QRS is capable of identifying MI, suggesting the feasibility of inversely reconstructing infarct regions from QRS.There exist sensitivity variations of different QRS criteria for identifying 17 MI scenarios, which is informative for solving the inverse problem.

* 10 pages, accpeted by FIMH 2022 
Viaarxiv icon

Revisiting k-NN for Pre-trained Language Models

Apr 18, 2023
Lei Li, Jing Chen, Bozhong Tian, Ningyu Zhang

Figure 1 for Revisiting k-NN for Pre-trained Language Models
Figure 2 for Revisiting k-NN for Pre-trained Language Models
Figure 3 for Revisiting k-NN for Pre-trained Language Models
Figure 4 for Revisiting k-NN for Pre-trained Language Models

Pre-trained Language Models (PLMs), as parametric-based eager learners, have become the de-facto choice for current paradigms of Natural Language Processing (NLP). In contrast, k-Nearest-Neighbor (k-NN) classifiers, as the lazy learning paradigm, tend to mitigate over-fitting and isolated noise. In this paper, we revisit k-NN classifiers for augmenting the PLMs-based classifiers. From the methodological level, we propose to adopt k-NN with textual representations of PLMs in two steps: (1) Utilize k-NN as prior knowledge to calibrate the training process. (2) Linearly interpolate the probability distribution predicted by k-NN with that of the PLMs' classifier. At the heart of our approach is the implementation of k-NN-calibrated training, which treats predicted results as indicators for easy versus hard examples during the training process. From the perspective of the diversity of application scenarios, we conduct extensive experiments on fine-tuning, prompt-tuning paradigms and zero-shot, few-shot and fully-supervised settings, respectively, across eight diverse end-tasks. We hope our exploration will encourage the community to revisit the power of classical methods for efficient NLP\footnote{Code and datasets are available in https://github.com/zjunlp/Revisit-KNN.

* Work in progress 
Viaarxiv icon