Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

A Generalist Agent

May 12, 2022
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

  Access Paper or Ask Questions

Memotion Analysis through the Lens of Joint Embedding

Dec 03, 2021
Nethra Gunti, Sathyanarayanan Ramamoorthy, Parth Patwa, Amitava Das

Joint embedding (JE) is a way to encode multi-modal data into a vector space where text remains as the grounding key and other modalities like image are to be anchored with such keys. Meme is typically an image with embedded text onto it. Although, memes are commonly used for fun, they could also be used to spread hate and fake information. That along with its growing ubiquity over several social platforms has caused automatic analysis of memes to become a widespread topic of research. In this paper, we report our initial experiments on Memotion Analysis problem through joint embeddings. Results are marginally yielding SOTA.

* Accepted as Student Abstract at AAAI-22 

  Access Paper or Ask Questions

Modeling Event Salience in Narratives via Barthes' Cardinal Functions

Nov 03, 2020
Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi, Kentaro Inui

Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained language model. Evaluating the proposed methods on folktales with event salience annotation, we show that the proposed methods outperform baseline methods and find fine-tuning a language model on narrative texts is a key factor in improving the proposed methods.

* accepted to COLING 2020 

  Access Paper or Ask Questions

Extracting Summary Knowledge Graphs from Long Documents

Sep 19, 2020
Zeqiu Wu, Rik Koncel-Kedziorski, Mari Ostendorf, Hannaneh Hajishirzi

Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications. Extracting compact knowledge graphs containing only salient entities and relations is important but challenging for understanding and summarizing long documents. We introduce a new text-to-graph task of predicting summarized knowledge graphs from long documents. We develop a dataset of 200k document/graph pairs using automatic and human annotations. We also develop strong baselines for this task based on graph learning and text summarization, and provide quantitative and qualitative studies of their effect.

  Access Paper or Ask Questions

Nakdan: Professional Hebrew Diacritizer

May 07, 2020
Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, Yoav Goldberg

We present a system for automatic diacritization of Hebrew text. The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually constructed tables and dictionaries. Besides providing state of the art diacritization accuracy, the system also supports an interface for manual editing and correction of the automatic output, and has several features which make it particularly useful for preparation of scientific editions of Hebrew texts. The system supports Modern Hebrew, Rabbinic Hebrew and Poetic Hebrew. The system is freely accessible for all use at

* Accepted to ACL 2020, System Demonstrations 

  Access Paper or Ask Questions

Classifying topics in speech when all you have is crummy translations

Aug 29, 2019
Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

Given a large amount of unannotated speech in a language with few resources, can we classify the speech utterances by topic? We show that this is possible if text translations are available for just a small amount of speech (less than 20 hours), using a recent model for direct speech-to-text translation. While the translations are poor, they are still good enough to correctly classify 1-minute speech segments over 70% of the time - a 20% improvement over a majority-class baseline. Such a system might be useful for humanitarian applications like crisis response, where incoming speech must be quickly assessed for further action.

  Access Paper or Ask Questions

RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation

Jul 15, 2019
Blaž Škrlj, Andraž Repar, Senja Pollak

Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems. We explore how load centrality, a graph-theoretic measure applied to graphs derived from a given text can be used to efficiently identify and rank keywords. Introducing meta vertices (aggregates of existing vertices) and systematic redundancy filters, the proposed method performs on par with state-of-the-art for the keyword extraction task on 14 diverse datasets. The proposed method is unsupervised, interpretable and can also be used for document visualization.

  Access Paper or Ask Questions

Authorship Attribution Using the Chaos Game Representation

Feb 14, 2018
Daniel Lichtblau, Catalin Stoean

The Chaos Game Representation, a method for creating images from nucleotide sequences, is modified to make images from chunks of text documents. Machine learning methods are then applied to train classifiers based on authorship. Experiments are conducted on several benchmark data sets in English, including the widely used Federalist Papers, and one in Portuguese. Validation results for the trained classifiers are competitive with the best methods in prior literature. The methodology is also successfully applied for text categorization with encouraging results. One classifier method is moreover seen to hold promise for the task of digital fingerprinting.

  Access Paper or Ask Questions

Modulating and attending the source image during encoding improves Multimodal Translation

Dec 09, 2017
Jean-Benoit Delbrouck, Stéphane Dupont

We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order to compute the most informative image features for our task. Additionally, we propose a new attention mechanism derived from this original idea, where the attention model for the visual input is conditioned on the source text encoder representations. In the paper, we detail our models as well as the image analysis pipeline. Finally, we report experimental results. They are, as far as we know, the new state of the art on three different test sets.

* Visually-Grounded Interaction and Language, NIPS 2017 Workshop 
* Accepted at NIPS Workshop 

  Access Paper or Ask Questions