Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Nonlinear prediction with neural nets in ADPCM

Mar 22, 2022
Marcos Faundez-Zanuy, Francesc Vallverdu, Enric Monte

In the last years there has been a growing interest for nonlinear speech models. Several works have been published revealing the better performance of nonlinear techniques, but little attention has been dedicated to the implementation of the nonlinear model into real applications. This work is focused on the study of the behaviour of a nonlinear predictive model based on neural nets, in a speech waveform coder. Our novel scheme obtains an improvement in SEGSNR between 1 and 2 dB for an adaptive quantization ranging from 2 to 5 bits.

* Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 1998, pp. 345-348 vol.1 
* 4 pages, published in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) Seattle, WA, USA. arXiv admin note: text overlap with arXiv:2203.01818 

  Access Paper or Ask Questions

Syllabification by Phone Categorization

Jul 15, 2018
Jacob Krantz, Maxwell Dulin, Paul De Palma, Mark VanDam

Syllables play an important role in speech synthesis, speech recognition, and spoken document retrieval. A novel, low cost, and language agnostic approach to dividing words into their corresponding syllables is presented. A hybrid genetic algorithm constructs a categorization of phones optimized for syllabification. This categorization is used on top of a hidden Markov model sequence classifier to find syllable boundaries. The technique shows promising preliminary results when trained and tested on English words.

* Jacob Krantz, Maxwell Dulin, Paul De Palma, and Mark VanDam. 2018. Syllabification by Phone Categorization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '18) 47-48 

  Access Paper or Ask Questions

Unified and Multilingual Author Profiling for Detecting Haters

Sep 19, 2021
Ipek Baris Schlicht, Angel Felipe Magnossão de Paula

This paper presents a unified user profiling framework to identify hate speech spreaders by processing their tweets regardless of the language. The framework encodes the tweets with sentence transformers and applies an attention mechanism to select important tweets for learning user profiles. Furthermore, the attention layer helps to explain why a user is a hate speech spreader by producing attention weights at both token and post level. Our proposed model outperformed the state-of-the-art multilingual transformer models.

* Published at the CLEF 2021 
* 9 pages, 2 figures, see the original paper: 

  Access Paper or Ask Questions

Refinement of a Structured Language Model

Jan 24, 2000
Ciprian Chelba, Frederick Jelinek

A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract meaningful information from the word history - thus enabling the use of extended distance dependencies - in an attempt to complement the locality of currently used n-gram Markov models. The model, its probabilistic parametrization, a reestimation algorithm for the model parameters and a set of experiments meant to evaluate its potential for speech recognition are presented.

* Proceedings of the International Conference on Advances in Pattern Recognition, 1998, pp. 275-284, Plymouth, UK 
* 10 pages 

  Access Paper or Ask Questions

May I Ask Who's Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance

Oct 29, 2020
Micaela Kaplan

We investigate using Named Entity Recognition on a new type of user-generated text: a call center conversation. These conversations combine problems from spontaneous speech with problems novel to conversational Automated Speech Recognition, including incorrect recognition, alongside other common problems from noisy user-generated text. Using our own corpus with new annotations, training custom contextual string embeddings, and applying a BiLSTM-CRF, we match state-of-the-art results on our novel task.

* Proceedings of the 2020 EMNLP Workshop W-NUT: The Sixth Workshop on Noisy User-generated Text (2020) 1-6 
* The 6th Workshop on Noisy User-generated Text (W-NUT) 2020 at EMNLP 

  Access Paper or Ask Questions

Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

Dec 01, 2020
Peter Wu, Yifan Zhong, Alan W Black

Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings. Our approach provides a new direction for cross-lingual data augmentation in any speech-based NLP task.

  Access Paper or Ask Questions

Abusive Language Detection and Characterization of Twitter Behavior

Sep 26, 2020
Dincy Davis, Reena Murali, Remesh Babu

In this work, abusive language detection in online content is performed using Bidirectional Recurrent Neural Network (BiRNN) method. Here the main objective is to focus on various forms of abusive behaviors on Twitter and to detect whether a speech is abusive or not. The results are compared for various abusive behaviors in social media, with Convolutional Neural Netwrok (CNN) and Recurrent Neural Network (RNN) methods and proved that the proposed BiRNN is a better deep learning model for automatic abusive speech detection.

* International Journal of Computer Sciences and Engineering, Vol.8, Issue.7, July 2020 
* 7 pages, 7 figures and 8 tables 

  Access Paper or Ask Questions

A Generative Model of a Pronunciation Lexicon for Hindi

May 06, 2017
Pramod Pandey, Somnath Roy

Voice browser applications in Text-to- Speech (TTS) and Automatic Speech Recognition (ASR) systems crucially depend on a pronunciation lexicon. The present paper describes the model of pronunciation lexicon of Hindi developed to automatically generate the output forms of Hindi at two levels, the and the (PS, in short for Prosodic Structure). The latter level involves both syllable-division and stress placement. The paper describes the tool developed for generating the two-level outputs of lexica in Hindi.

  Access Paper or Ask Questions

:telephone::person::sailboat::whale::okhand:; or "Call me Ishmael" - How do you translate emoji?

Nov 07, 2016
Will Radford, Andrew Chisholm, Ben Hachey, Bo Han

We report on an exploratory analysis of Emoji Dick, a project that leverages crowdsourcing to translate Melville's Moby Dick into emoji. This distinctive use of emoji removes textual context, and leads to a varying translation quality. In this paper, we use statistical word alignment and part-of-speech tagging to explore how people use emoji. Despite these simple methods, we observed differences in token and part-of-speech distributions. Experiments also suggest that semantics are preserved in the translation, and repetition is more common in emoji.

  Access Paper or Ask Questions

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

Aug 27, 2018
Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. We demonstrate that both tasks benefit from sharing the encoding part of the network, predicting tag subcategories, and using the tagger output as an input to the lemmatizer. We evaluate our model across several languages with complex morphology, which surpasses state-of-the-art accuracy in both part-of-speech tagging and lemmatization in Czech, German, and Arabic.

* 8 pages, 3 figures. Submitted to EMNLP 2018 

  Access Paper or Ask Questions