Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

An auditory cortex model for sound processing

Mar 08, 2021
Rand Asswad, Ugo Boscain, Giuseppina Turco, Dario Prandi, Ludovic Sacchelli

The reconstruction mechanisms built by the human auditory system during sound reconstruction are still a matter of debate. The purpose of this study is to refine the auditory cortex model introduced in [9], and inspired by the geometrical modelling of vision. The algorithm transforms the degraded sound in an 'image' in the time-frequency domain via a short-time Fourier transform. Such an image is then lifted in the Heisenberg group and it is reconstructed via a Wilson-Cowan differo-integral equation. Numerical experiments on a library of speech recordings are provided, showing the good reconstruction properties of the algorithm.

* arXiv admin note: substantial text overlap with arXiv:2004.02450 

  Access Paper or Ask Questions

Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities

Nov 05, 2020
Hao Zhang, Jae Ro, Richard Sproat

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search. We link this problem to the classic problem of Chinese word segmentation and show the effectiveness of a tagging model based on Recurrent Neural Networks (RNNs) using characters as input. To compensate for the lack of training data, we propose a pre-training method on concatenated entity names in a large knowledge database. Pre-training improves the model by 33% and brings the sequence accuracy to 85%.

  Access Paper or Ask Questions

Bayesian Neural Networks: An Introduction and Survey

Jun 22, 2020
Ethan Goan, Clinton Fookes

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

* Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, 1, (2020) 45-87 
* 44 pages, 8 figures 

  Access Paper or Ask Questions

On the Choice of Auxiliary Languages for Improved Sequence Tagging

May 19, 2020
Lukas Lange, Heike Adel, Jannik Strötgen

Recent work showed that embeddings from related languages can improve the performance of sequence tagging, even for monolingual models. In this analysis paper, we investigate whether the best auxiliary language can be predicted based on language distances and show that the most related language is not always the best auxiliary language. Further, we show that attention-based meta-embeddings can effectively combine pre-trained embeddings from different languages for sequence tagging and set new state-of-the-art results for part-of-speech tagging in five languages.

* RepL4NLP at ACL 2020 

  Access Paper or Ask Questions

Verbal Programming of Robot Behavior

Nov 21, 2019
Jonathan Connell

Home robots may come with many sophisticated built-in abilities, however there will always be a degree of customization needed for each user and environment. Ideally this should be accomplished through one-shot learning, as collecting the large number of examples needed for statistical inference is tedious. A particularly appealing approach is to simply explain to the robot, via speech, what it should be doing. In this paper we describe the ALIA cognitive architecture that is able to effectively incorporate user-supplied advice and prohibitions in this manner. The functioning of the implemented system on a small robot is illustrated by an associated video.

  Access Paper or Ask Questions

Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus

Mar 03, 2019
Bai Li, Yi-Te Hsu, Frank Rudzicz

Machine learning has shown promise for automatic detection of Alzheimer's disease (AD) through speech; however, efforts are hampered by a scarcity of data, especially in languages other than English. We propose a method to learn a correspondence between independently engineered lexicosyntactic features in two languages, using a large parallel corpus of out-of-domain movie dialogue data. We apply it to dementia detection in Mandarin Chinese, and demonstrate that our method outperforms both unilingual and machine translation-based baselines. This appears to be the first study that transfers feature domains in detecting cognitive decline.

* NAACL 2019 (Short paper) 

  Access Paper or Ask Questions

Topic-Specific Sentiment Analysis Can Help Identify Political Ideology

Oct 30, 2018
Sumit Bhatia, Deepak P

Ideological leanings of an individual can often be gauged by the sentiment one expresses about different issues. We propose a simple framework that represents a political ideology as a distribution of sentiment polarities towards a set of topics. This representation can then be used to detect ideological leanings of documents (speeches, news articles, etc.) based on the sentiments expressed towards different topics. Experiments performed using a widely used dataset show the promise of our proposed approach that achieves comparable performance to other methods despite being much simpler and more interpretable.

* Presented at EMNLP Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, 2018 

  Access Paper or Ask Questions

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Oct 27, 2018
Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals.

* To be submitted to ICASSP 2019 

  Access Paper or Ask Questions

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Oct 03, 2018
Dan Hendrycks, Kevin Gimpel

We consider the two related problems of detecting if an example is misclassified or out-of-distribution. We present a simple baseline that utilizes probabilities from softmax distributions. Correctly classified examples tend to have greater maximum softmax probabilities than erroneously classified and out-of-distribution examples, allowing for their detection. We assess performance by defining several tasks in computer vision, natural language processing, and automatic speech recognition, showing the effectiveness of this baseline across all. We then show the baseline can sometimes be surpassed, demonstrating the room for future research on these underexplored detection tasks.

* International Conference on Learning Representations 2017 
* Published as a conference paper at ICLR 2017. 1 Figure in 1 Appendix. Minor changes from the previous version 

  Access Paper or Ask Questions

German Dialect Identification Using Classifier Ensembles

Jul 22, 2018
Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu

In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018. We present a system based on SVM classifier ensembles trained on characters and words. The system was trained on a collection of speech transcripts of five Swiss-German dialects provided by the organizers. The transcripts included in the dataset contained speakers from Basel, Bern, Lucerne, and Zurich. Our entry in the challenge reached 62.03% F1-score and was ranked third out of eight teams.

  Access Paper or Ask Questions