Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Feb 10, 2015
Visar Berisha, Alan Wisler, Alfred O. Hero, Andreas Spanias

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.

* 12 pages, 5 figures 

  Access Paper or Ask Questions

Expoiting Syntactic Structure for Language Modeling

Jan 25, 2000
Ciprian Chelba, Frederick Jelinek

The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words--binary-parse-structure with headword annotation and operates in a left-to-right manner --- therefore usable for automatic speech recognition. The model, its probabilistic parameterization, and a set of experiments meant to evaluate its predictive power are presented; an improvement over standard trigram modeling is achieved.

* Proceedings of ACL'98, Montreal, Canada 
* changed ACM-class membership and buggy author names 

  Access Paper or Ask Questions

KoParadigm: A Korean Conjugation Paradigm Generator

Apr 28, 2020
Kyubyong Park

Korean is a morphologically rich language. Korean verbs change their forms in a fickle manner depending on tense, mood, speech level, meaning, etc. Therefore, it is challenging to construct comprehensive conjugation paradigms of Korean verbs. In this paper we introduce a Korean (verb) conjugation paradigm generator, dubbed KoParadigm. To the best of our knowledge, it is the first Korean conjugation module that covers all contemporary Korean verbs and endings. KoParadigm is not only linguistically well established, but also computationally simple and efficient. We share it via PyPi.

  Access Paper or Ask Questions

Excitation-based Voice Quality Analysis and Modification

Jan 02, 2020
Thomas Drugman, Thierry Dutoit, Baris Bozkurt

This paper investigates the differences occuring in the excitation for different voice qualities. Its goal is two-fold. First a large corpus containing three voice qualities (modal, soft and loud) uttered by the same speaker is analyzed and significant differences in characteristics extracted from the excitation are observed. Secondly rules of modification derived from the analysis are used to build a voice quality transformation system applied as a post-process to HMM-based speech synthesis. The system is shown to effectively achieve the transformations while maintaining the delivered quality.

  Access Paper or Ask Questions

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

Apr 01, 2019
Cheng-I Lai

This thesis describes our ongoing work on Contrastive Predictive Coding (CPC) features for speaker verification. CPC is a recently proposed representation learning framework based on predictive coding and noise contrastive estimation. We focus on incorporating CPC features into the standard automatic speaker verification systems, and we present our methods, experiments, and analysis. This thesis also details necessary background knowledge in past and recent work on automatic speaker verification systems, conventional speech features, and the motivation and techniques behind CPC.

  Access Paper or Ask Questions

A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation

Jan 14, 2018
Gang Chen

We describe recurrent neural networks (RNNs), which have attracted great attention on sequential tasks, such as handwriting recognition, speech recognition and image to text. However, compared to general feedforward neural networks, RNNs have feedback loops, which makes it a little hard to understand the backpropagation step. Thus, we focus on basics, especially the error backpropagation to compute gradients with respect to model parameters. Further, we go into detail on how error backpropagation algorithm is applied on long short-term memory (LSTM) by unfolding the memory unit.

* 9 pages 

  Access Paper or Ask Questions

Generative Adversarial Source Separation

Oct 30, 2017
Cem Subakan, Paris Smaragdis

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multi-layer perceptron trained with a Wasserstein-GAN formulation outperforms NMF, auto-encoders trained with maximum likelihood, and variational auto-encoders in terms of source to distortion ratio.

  Access Paper or Ask Questions

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Dec 11, 2014
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

* Presented in NIPS 2014 Deep Learning and Representation Learning Workshop 

  Access Paper or Ask Questions

An Efficient Compiler for Weighted Rewrite Rules

Jun 20, 1996
Mehryar Mohri, Richard Sproat

Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorithms. Further, many of our applications demand the ability to compile weighted rules into weighted FSTs, transducers generalized by providing transitions with weights. We have extended the algorithm to allow for this.

* 34th Annual Meeting of the ACL 

  Access Paper or Ask Questions

Disentangling Active and Passive Cosponsorship in the U.S. Congress

May 19, 2022
Giuseppe Russo, Christoph Gote, Laurence Brandenberger, Sophia Schlosser, Frank Schweitzer

In the U.S. Congress, legislators can use active and passive cosponsorship to support bills. We show that these two types of cosponsorship are driven by two different motivations: the backing of political colleagues and the backing of the bill's content. To this end, we develop an Encoder+RGCN based model that learns legislator representations from bill texts and speech transcripts. These representations predict active and passive cosponsorship with an F1-score of 0.88. Applying our representations to predict voting decisions, we show that they are interpretable and generalize to unseen tasks.

* 20 pages, 10 figures, 6 tables 

  Access Paper or Ask Questions