Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

A Research Platform for Multi-Robot Dialogue with Humans

Oct 12, 2019
Matthew Marge, Stephen Nogar, Cory J. Hayes, Stephanie M. Lukin, Jesse Bloecker, Eric Holder, Clare Voss

This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for speech recognition and dialogue management that are compatible with new domains, and implements an inter-agent communication protocol (tactical behavior specification), where verbal instructions are encoded for tasks assigned to the appropriate robot.

* Accepted for publication at NAACL 2019; also presented at AI-HRI 2019 (arXiv:1909.04812) 

  Access Paper or Ask Questions

From Textual Information Sources to Linked Data in the Agatha Project

Sep 03, 2019
Paulo Quaresma, Vitor Beires Nogueira, Kashyap Raiyani, Roy Bayot, Teresa Gonçalves

Automatic reasoning about textual information is a challenging task in modern Natural Language Processing (NLP) systems. In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri. Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal investigations. The provided architecture and ontology are language independent. Although some of the NLP modules are language dependent, they can be built using adequate AI methodologies.

* Part of DECLARE 19 proceedings 

  Access Paper or Ask Questions

Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer

Dec 26, 2018
Dhruv Ramani, Samarjit Karmakar, Anirban Panda, Asad Ahmed, Pratham Tangri

Recently, there has been great interest in the field of audio style transfer, where a stylized audio is generated by imposing the style of a reference audio on the content of a target audio. We improve on the current approaches which use neural networks to extract the content and the style of the audio signal and propose a new autoencoder based architecture for the task. This network generates a stylized audio for a content audio in a single forward pass. The proposed network architecture proves to be advantageous over the quality of audio produced and the time taken to train the network. The network is experimented on speech signals to confirm the validity of our proposal.

  Access Paper or Ask Questions

Charagram: Embedding Words and Sentences via Character n-grams

Jul 10, 2016
John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding. We use three tasks for evaluation: word similarity, sentence similarity, and part-of-speech tagging. We demonstrate that Charagram embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks.

  Access Paper or Ask Questions

A Semisupervised Approach for Language Identification based on Ladder Networks

Apr 01, 2016
Ehud Ben-Reuven, Jacob Goldberger

In this study we address the problem of training a neuralnetwork for language identification using both labeled and unlabeled speech samples in the form of i-vectors. We propose a neural network architecture that can also handle out-of-set languages. We utilize a modified version of the recently proposed Ladder Network semisupervised training procedure that optimizes the reconstruction costs of a stack of denoising autoencoders. We show that this approach can be successfully applied to the case where the training dataset is composed of both labeled and unlabeled acoustic data. The results show enhanced language identification on the NIST 2015 language identification dataset.

  Access Paper or Ask Questions

Multilingual Open Relation Extraction Using Cross-lingual Projection

Jun 05, 2015
Manaal Faruqui, Shankar Kumar

Open domain relation extraction systems identify relation and argument phrases in a sentence without relying on any underlying schema. However, current state-of-the-art relation extraction systems are available only for English because of their heavy reliance on linguistic tools such as part-of-speech taggers and dependency parsers. We present a cross-lingual annotation projection method for language independent relation extraction. We evaluate our method on a manually annotated test set and present results on three typologically different languages. We release these manual annotations and extracted relations in 61 languages from Wikipedia.

* Proceedings of NAACL 2015 

  Access Paper or Ask Questions

Combining semantic and syntactic structure for language modeling

Oct 24, 2001
Rens Bod

Structured language models for speech recognition have been shown to remedy the weaknesses of n-gram models. All current structured language models are, however, limited in that they do not take into account dependencies between non-headwords. We show that non-headword dependencies contribute to significantly improved word error rate, and that a data-oriented parsing model trained on semantically and syntactically annotated data can exploit these dependencies. This paper also contains the first DOP model trained by means of a maximum likelihood reestimation procedure, which solves some of the theoretical shortcomings of previous DOP models.

* Proceedings ICSLP'2000, Beijing, China 
* 4 pages 

  Access Paper or Ask Questions

The TreeBanker: a Tool for Supervised Training of Parsed Corpora

Jul 02, 1997
David Carter

I describe the TreeBanker, a graphical tool for the supervised training involved in domain customization of the disambiguation component of a speech- or language-understanding system. The TreeBanker presents a user, who need not be a system expert, with a range of properties that distinguish competing analyses for an utterance and that are relatively easy to judge. This allows training on a corpus to be completed in far less time, and with far less expertise, than would be needed if analyses were inspected directly: it becomes possible for a corpus of about 20,000 sentences of the complexity of those in the ATIS corpus to be judged in around three weeks of work by a linguistically aware non-expert.

* "Computational Environments ..." (ENVGRAM) workshop at ACL-97 
* 7 pages, needs aclap.sty. Replacement just corrects Figure 3 

  Access Paper or Ask Questions

The slurk Interaction Server Framework: Better Data for Better Dialog Models

Feb 02, 2022
Jana Götze, Maike Paetzel-Prüsmann, Wencke Liermann, Tim Diekmann, David Schlangen

This paper presents the slurk software, a lightweight interaction server for setting up dialog data collections and running experiments. Slurk enables a multitude of settings including text-based, speech and video interaction between two or more humans or humans and bots, and a multimodal display area for presenting shared or private interactive context. The software is implemented in Python with an HTML and JS frontend that can easily be adapted to individual needs. It also provides a setup for pairing participants on common crowdworking platforms such as Amazon Mechanical Turk and some example bot scripts for common interaction scenarios.

* submitted to LREC 2022 

  Access Paper or Ask Questions

Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator

Dec 20, 2021
Siddhartha Datta, Konrad Kollnig, Nigel Shadbolt

Digital harms are widespread in the mobile ecosystem. As these devices gain ever more prominence in our daily lives, so too increases the potential for malicious attacks against individuals. The last line of defense against a range of digital harms - including digital distraction, political polarisation through hate speech, and children being exposed to damaging material - is the user interface. This work introduces GreaseTerminator to enable researchers to develop, deploy, and test interventions against these harms with end-users. We demonstrate the ease of intervention development and deployment, as well as the broad range of harms potentially covered with GreaseTerminator in five in-depth case studies.

* Accepted in ACM IUI 2022 

  Access Paper or Ask Questions