Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Are Classes Clusters?

Apr 16, 2021
Kees Varekamp

Sentence embedding models aim to provide general purpose embeddings for sentences. Most of the models studied in this paper claim to perform well on STS tasks - but they do not report on their suitability for clustering. This paper looks at four recent sentence embedding models (Universal Sentence Encoder (Cer et al., 2018), Sentence-BERT (Reimers and Gurevych, 2019), LASER (Artetxe and Schwenk, 2019), and DeCLUTR (Giorgi et al., 2020)). It gives a brief overview of the ideas behind their implementations. It then investigates how well topic classes in two text classification datasets (Amazon Reviews (Ni et al., 2019) and News Category Dataset (Misra, 2018)) map to clusters in their corresponding sentence embedding space. While the performance of the resulting classification model is far from perfect, it is better than random. This is interesting because the classification model has been constructed in an unsupervised way. The topic classes in these real life topic classification datasets can be partly reconstructed by clustering the corresponding sentence embeddings.

* 7 pages, 4 tables 

  Access Paper or Ask Questions

UPB at SemEval-2021 Task 8: Extracting Semantic Information on Measurements as Multi-Turn Question Answering

Apr 09, 2021
Andrei-Marius Avram, George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu

Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses. The 8th task of SemEval-2021: Counts and Measurements (MeasEval) aimed to boost research in this direction by providing a new dataset on which participants train their models to extract meaningful information on measurements from scientific texts. The competition is composed of five subtasks that build on top of each other: (1) quantity span identification, (2) unit extraction from the identified quantities and their value modifier classification, (3) span identification for measured entities and measured properties, (4) qualifier span identification, and (5) relation extraction between the identified quantities, measured entities, measured properties, and qualifiers. We approached these challenges by first identifying the quantities, extracting their units of measurement, classifying them with corresponding modifiers, and afterwards using them to jointly solve the last three subtasks in a multi-turn question answering manner. Our best performing model obtained an overlapping F1-score of 36.91% on the test set.

* 5 pages, 3 figures, SemEval-2021 Workshop, ACL-IJCNLP 2021 

  Access Paper or Ask Questions

Advances in Multi-turn Dialogue Comprehension: A Survey

Mar 04, 2021
Zhuosheng Zhang, Hai Zhao

Training machines to understand natural language and interact with humans is an elusive and essential task in the field of artificial intelligence. In recent years, a diversity of dialogue systems has been designed with the rapid development of deep learning researches, especially the recent pre-trained language models. Among these studies, the fundamental yet challenging part is dialogue comprehension whose role is to teach the machines to read and comprehend the dialogue context before responding. In this paper, we review the previous methods from the perspective of dialogue modeling. We summarize the characteristics and challenges of dialogue comprehension in contrast to plain-text reading comprehension. Then, we discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks such as response selection and conversation question-answering, as well as dialogue-related language modeling techniques to enhance PrLMs in dialogue scenarios. Finally, we highlight the technical advances in recent years and point out the lessons we can learn from the empirical analysis and the prospects towards a new frontier of researches.

  Access Paper or Ask Questions

Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits

Feb 12, 2021
Leonid Boytsov, Zico Kolter

We study the utility of the lexical translation model (IBM Model 1) for English text retrieval, in particular, its neural variants that are trained end-to-end. We use the neural Model1 as an aggregator layer applied to context-free or contextualized query/document embeddings. This new approach to design a neural ranking system has benefits for effectiveness, efficiency, and interpretability. Specifically, we show that adding an interpretable neural Model 1 layer on top of BERT-based contextualized embeddings (1) does not decrease accuracy and/or efficiency; and (2) may overcome the limitation on the maximum sequence length of existing BERT models. The context-free neural Model 1 is less effective than a BERT-based ranking model, but it can run efficiently on a CPU (without expensive index-time precomputation or query-time operations on large tensors). Using Model 1 we produced best neural and non-neural runs on the MS MARCO document ranking leaderboard in late 2020.

* ECIR 2021 (The 43rd European Conference on Information Retrieval) 

  Access Paper or Ask Questions

VINS: Visual Search for Mobile User Interface Design

Feb 10, 2021
Sara Bunian, Kai Li, Chaima Jemmali, Casper Harteveld, Yun Fu, Magy Seif El-Nasr

Searching for relative mobile user interface (UI) design examples can aid interface designers in gaining inspiration and comparing design alternatives. However, finding such design examples is challenging, especially as current search systems rely on only text-based queries and do not consider the UI structure and content into account. This paper introduces VINS, a visual search framework, that takes as input a UI image (wireframe, high-fidelity) and retrieves visually similar design examples. We first survey interface designers to better understand their example finding process. We then develop a large-scale UI dataset that provides an accurate specification of the interface's view hierarchy (i.e., all the UI components and their specific location). By utilizing this dataset, we propose an object-detection based image retrieval framework that models the UI context and hierarchical structure. The framework achieves a mean Average Precision of 76.39\% for the UI detection and high performance in querying similar UI designs.

  Access Paper or Ask Questions

Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

Feb 01, 2021
Nora Kassner, Philipp Dufter, Hinrich Schütze

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT's performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

* Accepted to EACL 2021 

  Access Paper or Ask Questions

Classifying Scientific Publications with BERT -- Is Self-Attention a Feature Selection Method?

Jan 20, 2021
Andres Garcia-Silva, Jose Manuel Gomez-Perez

We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification in order to characterize self-attention as a possible feature selection approach. Using ConceptNet as ground truth, we also find that attended words are more related to the research fields of the articles. However, conventional feature selection methods are still a better option to learn classifiers from scratch. This result suggests that, while self-attention identifies domain-relevant terms, the discriminatory information in BERT is encoded in the contextualized outputs and the classification layer. It also raises the question whether injecting feature selection methods in the self-attention mechanism could further optimize single sequence classification using transformers.

* Paper accepted for publication at ECIR2021 

  Access Paper or Ask Questions

Cross-Document Language Modeling

Jan 02, 2021
Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our cross-document language model (CD-LM) improves masked language modeling for these tasks with two key ideas. First, we pretrain with multiple related documents in a single input, via cross-document masking, which encourages the model to learn cross-document and long-range relationships. Second, extending the recent Longformer model, we pretrain with long contexts of several thousand tokens and introduce a new attention pattern that uses sequence-level global attention to predict masked tokens, while retaining the familiar local attention elsewhere. We show that our CD-LM sets new state-of-the-art results for several multi-text tasks, including cross-document event and entity coreference resolution, paper citation recommendation, and documents plagiarism detection, while using a significantly reduced number of training parameters relative to prior works.

  Access Paper or Ask Questions

Quantum algorithms for spectral sums

Nov 12, 2020
Alessandro Luongo, Changpeng Shao

We propose and analyze new quantum algorithms for estimating the most common spectral sums of symmetric positive definite (SPD) matrices. For a function $f$ and a matrix $A \in \mathbb{R}^{n\times n}$, the spectral sum is defined as $S_f(A) :=\text{Tr}[f(A)] = \sum_j f(\lambda_j)$, where $\lambda_j$ are the eigenvalues. Examples of spectral sums are the von Neumann entropy, the trace of inverse, the log-determinant, and the Schatten-$p$ norm, where the latter does not require the matrix to be SPD. The fastest classical randomized algorithms estimate these quantities have a runtime that depends at least linearly on the number of nonzero components of the matrix. Assuming quantum access to the matrix, our algorithms are sub-linear in the matrix size, and depend at most quadratically on other quantities, like the condition number and the approximation error, and thus can compete with most of the randomized and distributed classical algorithms proposed in recent literature. These algorithms can be used as subroutines for solving many practical problems, for which the estimation of a spectral sum often represents a computational bottleneck.

  Access Paper or Ask Questions

Utterance-level Dialogue Understanding: An Empirical Study

Oct 22, 2020
Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria

The recent abundance of conversational data on the Web and elsewhere calls for effective NLP systems for dialog understanding. Complete utterance-level understanding often requires context understanding, defined by nearby utterances. In recent years, a number of approaches have been proposed for various utterance-level dialogue understanding tasks. Most of these approaches account for the context for effective understanding. In this paper, we explore and quantify the role of context for different aspects of a dialogue, namely emotion, intent, and dialogue act identification, using state-of-the-art dialog understanding methods as baselines. Specifically, we employ various perturbations to distort the context of a given utterance and study its impact on the different tasks and baselines. This provides us with insights into the fundamental contextual controlling factors of different aspects of a dialogue. Such insights can inspire more effective dialogue understanding models, and provide support for future text generation approaches. The implementation pertaining to this work is available at

  Access Paper or Ask Questions