Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

May 14, 2018
Guokan Shang, Wensi Ding, Zekun Zhang, Antoine Jean-Pierre Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, Jean-Pierre Lorré

We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations. Our work combines the strengths of multiple recent approaches while addressing their weaknesses. Moreover, we leverage recent advances in word embeddings and graph degeneracy applied to NLP to take exterior semantic knowledge into account, and to design custom diversity and informativeness measures. Experiments on the AMI and ICSI corpus show that our system improves on the state-of-the-art. Code and data are publicly available, and our system can be interactively tested.

* ACL 2018 Camera Ready 

  Access Paper or Ask Questions

Globally Normalized Transition-Based Neural Networks

Jun 08, 2016
Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models.

  Access Paper or Ask Questions

Improvising Linguistic Style: Social and Affective Bases for Agent Personality

Feb 26, 1997
Marilyn A. Walker, Janet E. Cahn, Stephen J. Whittaker

This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvisation of spoken utterances by artificial agents, with applications to interactive story and dialogue systems. We argue that linguistic style is a key aspect of character, and show how speech act representations common in AI can provide abstract representations from which computer characters can improvise. We show that the mechanisms proposed introduce the possibility of socially oriented agents, meet the requirements that lifelike characters be believable, and satisfy particular criteria for improvisation proposed by Hayes-Roth.

* Proceedings of the First International Conference on Autonomous Agents, Marina del Rey, California, USA. 1997. pp 96-105 
* 10 pages, uses aaai.sty, lingmacros.sty, psfig.sty 

  Access Paper or Ask Questions

Tagging French -- comparing a statistical and a constraint-based method

Mar 02, 1995
Jean-Pierre Chanod, Pasi Tapanainen

In this paper we compare two competing approaches to part-of-speech tagging, statistical and constraint-based disambiguation, using French as our test language. We imposed a time limit on our experiment: the amount of time spent on the design of our constraint system was about the same as the time we used to train and test the easy-to-implement statistical model. We describe the two systems and compare the results. The accuracy of the statistical method is reasonably good, comparable to taggers for English. But the constraint-based tagger seems to be superior even with the limited time we allowed ourselves for rule development.

* Seventh Conference of the European Chapter of the ACL (EACL95). 149-156. ACL, Dublin, 1995. 
* in Proceedings of EACL-95, uuencoded gzipped postscript 

  Access Paper or Ask Questions

Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning

Nov 13, 2020
Morteza Rohanian, Julian Hough

We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model with four tasks of disfluency detection, language modelling, part-of-speech tagging, and utterance segmentation in a simple deep recurrent setting. We show that these tasks provide positive inductive biases to each other with the optimal contribution of each one relying on the severity of the noise from the task. Our live multi-task model outperforms similar individual tasks, delivers competitive performance, and is beneficial for future use in conversational agents in psychiatric treatment.

* The 28th International Conference on Computational Linguistics (COLING 2020) 

  Access Paper or Ask Questions

Presenting Simultaneous Translation in Limited Space

Sep 18, 2020
Dominik Macháček, Ondřej Bojar

Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Deploying these systems for users faces the problem of presenting subtitles in a limited space, such as two lines on a television screen. The subtitles must be shown promptly, incrementally, and with adequate time for reading. We provide an algorithm for subtitling. Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.

* ITAT WAFNL 2020 

  Access Paper or Ask Questions

Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction

Dec 12, 2018
Amir Aly, Tadahiro Taniguchi

Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual perception, and induction of Combinatory Categorial Grammar (CCG) for phrases. This paves the way towards grounding phrases so as to make a robot able to understand human instructions appropriately during interaction.

* Proceedings of the International Conference on Social Cognition in Humans and Robots (socSMCs), Germany, 2018 

  Access Paper or Ask Questions

Understanding Learning Dynamics Of Language Models with SVCCA

Nov 01, 2018
Naomi Saphra, Adam Lopez

Recent work has demonstrated that neural language models encode linguistic structure implicitly in a number of ways. However, existing research has not shed light on the process by which this structure is acquired during training. We use SVCCA as a tool for understanding how a language model is implicitly predicting a variety of word cluster tags. We present experiments suggesting that a single recurrent layer of a language model learns linguistic structure in phases. We find, for example, that a language model naturally stabilizes its representation of part of speech earlier than it learns semantic and topic information.

  Access Paper or Ask Questions

Neural Architecture Search: A Survey

Sep 05, 2018
Thomas Elsken, Jan Hendrik Metzen, Frank Hutter

Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.

  Access Paper or Ask Questions

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit

Apr 01, 2018
Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras, Mark Johnson

We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic annotations to facilitate research work on Vietnamese NLP. Our VnCoreNLP is open-source and available at:

* Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, NAACL 2018, to appear 

  Access Paper or Ask Questions