Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State

Mar 08, 2019
Richard Futrell, Ethan Wilcox, Takashi Morita, Peng Qian, Miguel Ballesteros, Roger Levy

We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of English (Jozefowicz et al., 2016; Gulordava et al., 2018) trained on large datasets; an RNNG (Dyer et al., 2016) trained on a small, parsed dataset; and an LSTM trained on the same small corpus as the RNNG. We find evidence that the LSTMs trained on large datasets represent syntactic state over large spans of text in a way that is comparable to the RNNG, while the LSTM trained on the small dataset does not or does so only weakly.

* Accepted to NAACL 2019. Not yet edited into the camera-ready version 

  Access Paper or Ask Questions

Contrastive Training for Models of Information Cascades

Dec 11, 2018
Shaobin Xu, David A. Smith

This paper proposes a model of information cascades as directed spanning trees (DSTs) over observed documents. In addition, we propose a contrastive training procedure that exploits partial temporal ordering of node infections in lieu of labeled training links. This combination of model and unsupervised training makes it possible to improve on models that use infection times alone and to exploit arbitrary features of the nodes and of the text content of messages in information cascades. With only basic node and time lag features similar to previous models, the DST model achieves performance with unsupervised training comparable to strong baselines on a blog network inference task. Unsupervised training with additional content features achieves significantly better results, reaching half the accuracy of a fully supervised model.

* Accepted in AAAI-18 

  Access Paper or Ask Questions

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

Oct 30, 2018
Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, Geehyuk Lee

Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.

* 7 pages; video and dataset: 

  Access Paper or Ask Questions

Deep State Space Models for Unconditional Word Generation

Oct 28, 2018
Florian Schmidt, Thomas Hofmann

Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models. However, such feedback is known to introduce systematic biases into the training process and it obscures a principle of generation: committing to global information and forgetting local nuances. We show that a non-autoregressive deep state space model with a clear separation of global and local uncertainty can be built from only two ingredients: An independent noise source and a deterministic transition function. Recent advances on flow-based variational inference can be used to train an evidence lower-bound without resorting to annealing, auxiliary losses or similar measures. The result is a highly interpretable generative model on par with comparable auto-regressive models on the task of word generation.

* NIPS camera-ready version 

  Access Paper or Ask Questions

Dialogue Modeling Via Hash Functions

Oct 18, 2018
Sahil Garg, Irina Rish, Guillermo Cecchi, Shuyang Gao, Palash Goyal, Sarik Ghazarian, Greg Ver Steeg, Aram Galstyan

We propose a novel dialogue modeling framework which uses binary hashcodes as compressed text representations, allowing for efficient similarity search, and a novel lower bound on mutual information between the hashcodes of the two dialog agents, which serves as a model selection criterion for optimizing those representations towards better alignment between the dialog participants and higher predictability of one response from another, facilitating better dialog generation. Empirical evaluation on several datasets, from depression therapy sessions to Larry King TV show interviews and Twitter data, demonstrate that our hashing-based approach is competitive with state-of-art neural network based dialogue generation systems, often significantly outperforming them in terms of response quality and computational efficiency, especially on relatively small datasets.

* Presented at IJCAI-ICML 2018 Workshops. The paper is revised significantly with an addition of elaborate experimental analysis 

  Access Paper or Ask Questions

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

Oct 11, 2018
Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge

Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun-name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines which demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

  Access Paper or Ask Questions

Machine Learning Suites for Online Toxicity Detection

Oct 03, 2018
David Noever

To identify and classify toxic online commentary, the modern tools of data science transform raw text into key features from which either thresholding or learning algorithms can make predictions for monitoring offensive conversations. We systematically evaluate 62 classifiers representing 19 major algorithmic families against features extracted from the Jigsaw dataset of Wikipedia comments. We compare the classifiers based on statistically significant differences in accuracy and relative execution time. Among these classifiers for identifying toxic comments, tree-based algorithms provide the most transparently explainable rules and rank-order the predictive contribution of each feature. Among 28 features of syntax, sentiment, emotion and outlier word dictionaries, a simple bad word list proves most predictive of offensive commentary.

  Access Paper or Ask Questions

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

Aug 28, 2018
Xinyi Wang, Hieu Pham, Zihang Dai, Graham Neubig

In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.

* Accepted as a short paper at the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) 

  Access Paper or Ask Questions

Neural language representations predict outcomes of scientific research

May 17, 2018
James P. Bagrow, Daniel Berenberg, Joshua Bongard

Many research fields codify their findings in standard formats, often by reporting correlations between quantities of interest. But the space of all testable correlates is far larger than scientific resources can currently address, so the ability to accurately predict correlations would be useful to plan research and allocate resources. Using a dataset of approximately 170,000 correlational findings extracted from leading social science journals, we show that a trained neural network can accurately predict the reported correlations using only the text descriptions of the correlates. Accurate predictive models such as these can guide scientists towards promising untested correlates, better quantify the information gained from new findings, and has implications for moving artificial intelligence systems from predicting structures to predicting relationships in the real world.

* 8 pages, 3 figures, plus supporting material 

  Access Paper or Ask Questions

Object Activity Scene Description, Construction and Recognition

May 01, 2018
Hui Feng, Shanshan Wang, Shuzhi Sam Ge

Action recognition is a critical task for social robots to meaningfully engage with their environment. 3D human skeleton-based action recognition is an attractive research area in recent years. Although, the existing approaches are good at action recognition, it is a great challenge to recognize a group of actions in an activity scene. To tackle this problem, at first, we partition the scene into several primitive actions (PAs) based upon motion attention mechanism. Then, the primitive actions are described by the trajectory vectors of corresponding joints. After that, motivated by text classification based on word embedding, we employ convolution neural network (CNN) to recognize activity scenes by considering motion of joints as "word" of activity. The experimental results on the scenes of human activity dataset show the efficiency of the proposed approach.

* 13 pages, 9 figures 

  Access Paper or Ask Questions