Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

On the Optimality of Vagueness: "Around", "Between", and the Gricean Maxims

Aug 26, 2020
Paul Egré, Benjamin Spector, Adèle Mortier, Steven Verheyen

Why is our language vague? We argue that in contexts in which a cooperative speaker is not perfectly informed about the world, the use of vague expressions can offer an optimal tradeoff between truthfulness (Gricean Quality) and informativeness (Gricean Quantity). Focusing on expressions of approximation such as "around", which are semantically vague, we show that they allow the speaker to convey indirect probabilistic information, in a way that gives the listener a more accurate representation of the information available to the speaker than any more precise expression would (intervals of the form "between"). We give a probabilistic treatment of the interpretation of "around", and offer a model for the interpretation and use of "around"-statements within the Rational Speech Act (RSA) framework. Our model differs in substantive ways from the Lexical Uncertainty model often used within the RSA framework for vague predicates.

  Access Paper or Ask Questions

Towards Finite-State Morphology of Kurdish

May 21, 2020
Sina Ahmadi, Hossein Hassani

Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology. In this paper, as the first attempt of its kind, the morphology of the Kurdish language (Sorani dialect) is described from a computational point of view. We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words. The result of this research assists in conducting studies on language generation for Kurdish and enhances the Information Retrieval (IR) capacity for the language while leveraging the Kurdish NLP and CL into a more advanced computational level.

* Manuscript submitted to ACM-TALLIP 

  Access Paper or Ask Questions

Offensive Language Identification in Greek

Mar 18, 2020
Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data.

* Accepted to LREC 2020 

  Access Paper or Ask Questions

Improving automated segmentation of radio shows with audio embeddings

Feb 12, 2020
Oberon Berlage, Klaus-Michael Lux, David Graus

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.

* 5 pages, 2 figures, submitted to ICASSP2020 

  Access Paper or Ask Questions

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

May 27, 2019
Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Jonathan M. Cohen

We propose NovoGrad, a first-order stochastic gradient method with layer-wise gradient normalization via second moment estimators and with decoupled weight decay for a better regularization. The method requires half as much memory as Adam/AdamW. We evaluated NovoGrad on the diverse set of problems, including image classification, speech recognition, neural machine translation and language modeling. On these problems, NovoGrad performed equal to or better than SGD and Adam/AdamW. Empirically we show that NovoGrad (1) is very robust during the initial training phase and does not require learning rate warm-up, (2) works well with the same learning rate policy for different problems, and (3) generally performs better than other optimizers for very large batch sizes

* Submitted to NeurIPS 2019 

  Access Paper or Ask Questions

Free Component Analysis: Theory, Algorithms & Applications

May 05, 2019
Hao Wu, Raj Rao Nadakuditi

We describe a method for unmixing mixtures of freely independent random variables in a manner analogous to the independent component analysis (ICA) based method for unmixing independent random variables from their additive mixtures. Random matrices play the role of free random variables in this context so the method we develop, which we call Free component analysis (FCA), unmixes matrices from additive mixtures of matrices. We describe the theory, the various algorithms, and compare FCA to ICA. We show that FCA performs comparably to, and often better than, ICA in every application, such as image and speech unmixing, where ICA has been known to succeed. Our computational experiments suggest that not-so-random matrices, such as images and spectrograms of waveforms are (closer to being) freer "in the wild" than we might have theoretically expected.

* 66 pages, 15 figures 

  Access Paper or Ask Questions

Teaching Machines to Code: Neural Markup Generation with Visual Attention

Jun 15, 2018
Sumeet S. Singh

We present a neural transducer model with visual attention that learns to generate LaTeX markup of a real-world math formula given its image. Applying sequence modeling and transduction techniques that have been very successful across modalities such as natural language, image, handwriting, speech and audio; we construct an image-to-markup model that learns to produce syntactically and semantically correct LaTeX markup code over 150 words long and achieves a BLEU score of 89%; improving upon the previous state-of-art for the Im2Latex problem. We also demonstrate with heat-map visualization how attention helps in interpreting the model and can pinpoint (detect and localize) symbols on the image accurately despite having been trained without any bounding box data.

* For datasets, visualizations and ancillary material see: . For source code go to: 

  Access Paper or Ask Questions

Advancing Connectionist Temporal Classification With Attention Modeling

Mar 15, 2018
Amit Das, Jinyu Li, Rui Zhao, Yifan Gong

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.

* Accepted at ICASSP 2018 

  Access Paper or Ask Questions

Experiments with POS Tagging Code-mixed Indian Social Media Text

Oct 31, 2016
Prakash B. Pimpale, Raj Nath Patel

This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.

* In the Proceedings of the 12th International Conference on Natural Language Processing (ICON 2015) 
* 3 Pages, Published in the Proceedings of the Tool Contest on POS Tagging for Code-mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text 

  Access Paper or Ask Questions

Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy

Jun 28, 2016
Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss

Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders. It is usually challenging due to pronunciation variations. This paper proposes an efficient and robust data-driven technique which automatically learns acceptable word pronunciations and updates the pronunciation dictionary to build a better lexicon without affecting recognition of other words similar to the target word. It generalizes well on datasets with various sizes, and reduces the error rate on a database with 13000+ human names by 42%, compared to a baseline with regular dictionaries already covering canonical pronunciations of 97%+ words in names, plus a well-trained spelling-to-pronunciation (STP) engine.

* Interspeech 2016 

  Access Paper or Ask Questions