Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning

Dec 30, 2020
Yujia Qin, Yankai Lin, Ryuichi Takanobu, Zhiyuan Liu, Peng Li, Heng Ji, Minlie Huang, Maosong Sun, Jie Zhou

Pre-trained Language Models (PLMs) have shown strong performance in various downstream Natural Language Processing (NLP) tasks. However, PLMs still cannot well capture the factual knowledge in the text, which is crucial for understanding the whole text, especially for document-level language understanding tasks. To address this issue, we propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Specifically, (1) to better understand entities, we propose an entity discrimination task that distinguishes which tail entity can be inferred by the given head entity and relation. (2) Besides, to better understand relations, we employ a relation discrimination task which distinguishes whether two entity pairs are close or not in relational semantics. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting. Meanwhile, ERICA achieves comparable or better performance on sentence-level tasks. We will release the datasets, source codes and pre-trained language models for further research explorations.

* preprint 

  Access Paper or Ask Questions

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

Jun 04, 2020
Haoyang Huang, Lin Su, Di Qi, Nan Duan, Edward Cui, Taroon Bharti, Lei Zhang, Lijuan Wang, Jianfeng Gao, Bei Liu, Jianlong Fu, Dongdong Zhang, Xin Liu, Ming Zhou

This paper presents a Multitask Multilingual Multimodal Pre-trained model (M3P) that combines multilingual-monomodal pre-training and monolingual-multimodal pre-training into a unified framework via multitask learning and weight sharing. The model learns universal representations that can map objects that occurred in different modalities or expressed in different languages to vectors in a common semantic space. To verify the generalization capability of M3P, we fine-tune the pre-trained model for different types of downstream tasks: multilingual image-text retrieval, multilingual image captioning, multimodal machine translation, multilingual natural language inference and multilingual text generation. Evaluation shows that M3P can (i) achieve comparable results on multilingual tasks and English multimodal tasks, compared to the state-of-the-art models pre-trained for these two types of tasks separately, and (ii) obtain new state-of-the-art results on non-English multimodal tasks in the zero-shot or few-shot setting. We also build a new Multilingual Image-Language Dataset (MILD) by collecting large amounts of (text-query, image, context) triplets in 8 languages from the logs of a commercial search engine

* 10 pages,2 figures 

  Access Paper or Ask Questions

Is Word Segmentation Necessary for Deep Learning of Chinese Representations?

May 14, 2019
Yuxian Meng, Xiaoya Li, Xiaofei Sun, Qinghong Han, Arianna Yuan, Jiwei Li

Segmenting a chunk of text into words is usually the first step of processing Chinese text, but its necessity has rarely been explored. In this paper, we ask the fundamental question of whether Chinese word segmentation (CWS) is necessary for deep learning-based Chinese Natural Language Processing. We benchmark neural word-based models which rely on word segmentation against neural char-based models which do not involve word segmentation in four end-to-end NLP benchmark tasks: language modeling, machine translation, sentence matching/paraphrase and text classification. Through direct comparisons between these two types of models, we find that char-based models consistently outperform word-based models. Based on these observations, we conduct comprehensive experiments to study why word-based models underperform char-based models in these deep learning-based NLP tasks. We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting. We hope this paper could encourage researchers in the community to rethink the necessity of word segmentation in deep learning-based Chinese Natural Language Processing. \footnote{Yuxian Meng and Xiaoya Li contributed equally to this paper.}

* to appear at ACL2019 

  Access Paper or Ask Questions

IDEL: In-Database Entity Linking with Neural Embeddings

Mar 13, 2018
Torsten Kilias, Alexander Löser, Felix A. Gers, Richard Koopmanschap, Ying Zhang, Martin Kersten

We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of MonetDB to support in-database-analytics with user defined functions (UDFs) implemented in Python. These functions call machine learning libraries for neural text mining, such as TensorFlow. The system achieves zero cost for data shipping and transformation by utilizing MonetDB's ability to embed Python processes in the database kernel and exchange data in NumPy arrays. IDEL represents text and relational data in a joint vector space with neural embeddings and can compensate errors with ambiguous entity representations. For detecting matching entities, we propose a novel similarity function based on joint neural embeddings which are learned via minimizing pairwise contrastive ranking loss. This function utilizes a high dimensional index structures for fast retrieval of matching entities. Our first implementation and experiments using the WebNLG corpus show the effectiveness and the potentials of IDEL.

* This manuscript is a preprint for a paper submitted to VLDB2018 

  Access Paper or Ask Questions

Cross-Validated Tuning of Shrinkage Factors for MVDR Beamforming Based on Regularized Covariance Matrix Estimation

Apr 05, 2021
Lei Xie, Zishu He, Jun Tong, Jun Li, Jiangtao Xi

This paper considers the regularized estimation of covariance matrices (CM) of high-dimensional (compound) Gaussian data for minimum variance distortionless response (MVDR) beamforming. Linear shrinkage is applied to improve the accuracy and condition number of the CM estimate for low-sample-support cases. We focus on data-driven techniques that automatically choose the linear shrinkage factors for shrinkage sample covariance matrix ($\text{S}^2$CM) and shrinkage Tyler's estimator (STE) by exploiting cross validation (CV). We propose leave-one-out cross-validation (LOOCV) choices for the shrinkage factors to optimize the beamforming performance, referred to as $\text{S}^2$CM-CV and STE-CV. The (weighted) out-of-sample output power of the beamfomer is chosen as a proxy of the beamformer performance and concise expressions of the LOOCV cost function are derived to allow fast optimization. For the large system regime, asymptotic approximations of the LOOCV cost functions are derived, yielding the $\text{S}^2$CM-AE and STE-AE. In general, the proposed algorithms are able to achieve near-oracle performance in choosing the linear shrinkage factors for MVDR beamforming. Simulation results are provided for validating the proposed methods.

* To be submitted to the IEEE or Elsevier for possible publication 

  Access Paper or Ask Questions

Understanding Guided Image Captioning Performance across Domains

Dec 04, 2020
Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut

Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the concepts that an image caption should focus on, using an additional input called the guiding text that refers to either groundable or ungroundable concepts in the image. Our model consists of a Transformer-based multimodal encoder that uses the guiding text together with global and object-level image features to derive early-fusion representations used to generate the guided caption. While models trained on Visual Genome data have an in-domain advantage of fitting well when guided with automatic object labels, we find that guided captioning models trained on Conceptual Captions generalize better on out-of-domain images and guiding texts. Our human-evaluation results indicate that attempting in-the-wild guided image captioning requires access to large, unrestricted-domain training datasets, and that increased style diversity (even without increasing vocabulary size) is a key factor for improved performance.

* 13 pages, 7 figures 

  Access Paper or Ask Questions

Semantic Hypergraphs

Aug 28, 2019
Telmo Menezes, Camille Roth

Existing computational methods for the analysis of corpora of text in natural language are still far from approaching a human level of understanding. We attempt to advance the state of the art by introducing a model and algorithmic framework to transform text into recursively structured data. We apply this to the analysis of news titles extracted from a social news aggregation website. We show that a recursive ordered hypergraph is a sufficiently generic structure to represent significant number of fundamental natural language constructs, with advantages over conventional approaches such as semantic graphs. We present a pipeline of transformations from the output of conventional NLP algorithms to such hypergraphs, which we denote as semantic hypergraphs. The features of these transformations include the creation of new concepts from existing ones, the organisation of statements into regular structures of predicates followed by an arbitrary number of entities and the ability to represent statements about other statements. We demonstrate knowledge inference from the hypergraph, identifying claims and expressions of conflicts, along with their participating actors and topics. We show how this enables the actor-centric summarization of conflicts, comparison of topics of claims between actors and networks of conflicts between actors in the context of a given topic. On the whole, we propose a hypergraphic knowledge representation model that can be used to provide effective overviews of a large corpus of text in natural language.

  Access Paper or Ask Questions

Evaluating Computational Language Models with Scaling Properties of Natural Language

Jun 22, 2019
Shuntaro Takahashi, Kumiko Tanaka-Ishii

In this article, we evaluate computational models of natural language with respect to the universal statistical behaviors of natural language. Statistical mechanical analyses have revealed that natural language text is characterized by scaling properties, which quantify the global structure in the vocabulary population and the long memory of a text. We study whether five scaling properties (given by Zipf's law, Heaps' law, Ebeling's method, Taylor's law, and long-range correlation analysis) can serve for evaluation of computational models. Specifically, we test $n$-gram language models, a probabilistic context-free grammar (PCFG), language models based on Simon/Pitman-Yor processes, neural language models, and generative adversarial networks (GANs) for text generation. Our analysis reveals that language models based on recurrent neural networks (RNNs) with a gating mechanism (i.e., long short-term memory, LSTM; a gated recurrent unit, GRU; and quasi-recurrent neural networks, QRNNs) are the only computational models that can reproduce the long memory behavior of natural language. Furthermore, through comparison with recently proposed model-based evaluation methods, we find that the exponent of Taylor's law is a good indicator of model quality.

* 32 pages, accepted by Computational Linguistics 

  Access Paper or Ask Questions

Enhancing Item Response Theory for Cognitive Diagnosis

Jun 01, 2019
Song Cheng, Qi Liu

Cognitive diagnosis is a fundamental and crucial task in many educational applications, e.g., computer adaptive test and cognitive assignments. Item Response Theory (IRT) is a classical cognitive diagnosis method which can provide interpretable parameters (i.e., student latent trait, question discrimination, and difficulty) for analyzing student performance. However, traditional IRT ignores the rich information in question texts, cannot diagnose knowledge concept proficiency, and it is inaccurate to diagnose the parameters for the questions which only appear several times. To this end, in this paper, we propose a general Deep Item Response Theory (DIRT) framework to enhance traditional IRT for cognitive diagnosis by exploiting semantic representation from question texts with deep learning. In DIRT, we first use a proficiency vector to represent students' proficiency in knowledge concepts and embed question texts and knowledge concepts to dense vectors by Word2Vec. Then, we design a deep diagnosis module to diagnose parameters in traditional IRT by deep learning techniques. Finally, with the diagnosed parameters, we input them into the logistic-like formula of IRT to predict student performance. Extensive experimental results on real-world data clearly demonstrate the effectiveness and interpretation power of DIRT framework.

  Access Paper or Ask Questions