Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Communication Modalities for Supervised Teleoperation in Highly Dexterous Tasks - Does one size fit all?

Apr 17, 2017
Tian Zhou, Maria E. Cabrera, Juan P. Wachs

This study tries to explain the connection between communication modalities and levels of supervision in teleoperation during a dexterous task, like surgery. This concept is applied to two surgical related tasks: incision and peg transfer. It was found that as the complexity of the task escalates, the combination linking human supervision with a more expressive modality shows better performance than other combinations of modalities and control. More specifically, in the peg transfer task, the combination of speech modality and action level supervision achieves shorter task completion time (77.1 +- 3.4 s) with fewer mistakes (0.20 +- 0.17 pegs dropped).

* Previously published online at 2nd Workshop on the Role of Human Sensormotor Control in Surgical Robotics at 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany 

  Access Paper or Ask Questions

Discriminative Regularization for Generative Models

Feb 15, 2016
Alex Lamb, Vincent Dumoulin, Aaron Courville

We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to humans: identity in faces, objects in images, and utterances in speech. We propose to take advantage of this by using the representations from discriminative classifiers to augment the objective function corresponding to a generative model. In particular we enhance the objective function of the variational autoencoder, a popular generative model, with a discriminative regularization term. We show that enhancing the objective function in this way leads to samples that are clearer and have higher visual quality than the samples from the standard variational autoencoders.

  Access Paper or Ask Questions

Polyglot: Distributed Word Representations for Multilingual NLP

Jun 27, 2014
Rami Al-Rfou, Bryan Perozzi, Steven Skiena

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.

* 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'2013 

  Access Paper or Ask Questions

Locality-Sensitive Hashing with Margin Based Feature Selection

Oct 11, 2012
Makiko Konoshima, Yui Noma

We propose a learning method with feature selection for Locality-Sensitive Hashing. Locality-Sensitive Hashing converts feature vectors into bit arrays. These bit arrays can be used to perform similarity searches and personal authentication. The proposed method uses bit arrays longer than those used in the end for similarity and other searches and by learning selects the bits that will be used. We demonstrated this method can effectively perform optimization for cases such as fingerprint images with a large number of labels and extremely few data that share the same labels, as well as verifying that it is also effective for natural images, handwritten digits, and speech features.

* 9 pages, 6 figures, 3 tables 

  Access Paper or Ask Questions

Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

Jun 14, 1999
Gert Veldhuijzen van Zanten, Gosse Bouma, Khalil Sima'an, Gertjan van Noord, Remko Bonnema

The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison.

* Proceedings of CLIN 99 

  Access Paper or Ask Questions

Analysing the Greek Parliament Records with Emotion Classification

May 24, 2022
John Pavlopoulos, Vanessa Lislevand

In this project, we tackle emotion classification for the Greek language, presenting and releasing a new dataset in Greek. We fine-tune and assess Transformer-based masked language models that were pre-trained on monolingual and multilingual resources, and we present the results per emotion and by aggregating at the sentiment and subjectivity level. The potential of the presented resources is investigated by detecting and studying the emotion of `disgust' in the Greek Parliament records. We: (a) locate the months with the highest values from 1989 to present, (b) rank the Greek political parties based on the presence of this emotion in their speeches, and (c) study the emotional context shift of words used to stigmatise people.

  Access Paper or Ask Questions

Developing Universal Dependency Treebanks for Magahi and Braj

Apr 26, 2022
Mohit Raj, Shyam Ratan, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies. This paper gives a description of the different dependency relationship found in the two languages and give some statistics of the two treebanks. The dataset will be made publicly available on Universal Dependency (UD) repository ( in the next(v2.10) release.

* 11 pages, Workshop on Parsing and its Applications for Indian Languages (PAIL-2021) at ICON 2021 

  Access Paper or Ask Questions

Oracle Linguistic Graphs Complement a Pretrained Transformer Language Model: A Cross-formalism Comparison

Dec 15, 2021
Jakob Prange, Nathan Schneider, Lingpeng Kong

We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, we find that, overall, semantic constituency structures are most useful to language modeling performance -- outpacing syntactic constituency structures as well as syntactic and semantic dependency structures. Further, effects vary greatly depending on part-of-speech class. In sum, our findings point to promising tendencies in neuro-symbolic language modeling and invite future research quantifying the design choices made by different formalisms.

  Access Paper or Ask Questions

Apurinã Universal Dependencies Treebank

Jun 07, 2021
Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

This paper presents and discusses the first Universal Dependencies treebank for the Apurin\~a language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features - some of which are unique to Apurin\~a. The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon. The source materials used in the initial treebank represent fieldwork practices where not all tokens of all sentences are equally annotated. For this reason, establishing regular annotation practices for the entire Apurin\~a treebank is an ongoing project.

* The First Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) 

  Access Paper or Ask Questions

dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing

Apr 27, 2021
Diego Di Carlo, Pinchas Tandeitnik, Cédric Foy, Antoine Deleforge, Nancy Bertin, Sharon Gannot

This paper presents dEchorate: a new database of measured multichannel Room Impulse Responses (RIRs) including annotations of early echo timings and 3D positions of microphones, real sources and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, acoustic echo retrieval, microphone calibration, echo labeling and reflectors estimation. The database is accompanied with software utilities to easily access, manipulate and visualize the data as well as baseline methods for echo-related tasks.

  Access Paper or Ask Questions