Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

Dec 28, 2019
Thomas Drugman, Abeer Alwan

This paper focuses on the problem of pitch tracking in noisy conditions. A method using harmonic information in the residual signal is presented. The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech. In the experiments, the method is compared to six state-of-the-art pitch trackers on the Keele and CSTR databases. The proposed technique is shown to be particularly robust to additive noise, leading to a significant improvement in adverse conditions.

  Access Paper or Ask Questions

Development of a 3D tongue motion visualization platform based on ultrasound image sequences

May 19, 2016
Kele Xu, Yin Yang, Aurore Jaumard-Hakoun, Clemence Leboullenger, Gerard Dreyfus, Pierre Roussel, Maureen Stone, Bruce Denby

This article describes the development of a platform designed to visualize the 3D motion of the tongue using ultrasound image sequences. An overview of the system design is given and promising results are presented. Compared to the analysis of motion in 2D image sequences, such a system can provide additional visual information and a quantitative description of the tongue 3D motion. The platform can be useful in a variety of fields, such as speech production, articulation training, etc.

* 5 Pages, 5 figures, published in 18th International Congress of Phonetic Sciences, 2015 

  Access Paper or Ask Questions

Recurrent Neural Network Regularization

Feb 19, 2015
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

  Access Paper or Ask Questions

Sequence Modeling using Gated Recurrent Neural Networks

Jan 01, 2015
Mohammad Pezeshki

In this paper, we have used Recurrent Neural Networks to capture and model human motion data and generate motions by prediction of the next immediate data point at each time-step. Our RNN is armed with recently proposed Gated Recurrent Units which has shown promising results in some sequence modeling problems such as Machine Translation and Speech Synthesis. We demonstrate that this model is able to capture long-term dependencies in data and generate realistic motions.

  Access Paper or Ask Questions

A Compact Architecture for Dialogue Management Based on Scripts and Meta-Outputs

Jun 09, 2000
Manny Rayner, Beth Ann Hockey, Frankie James

We describe an architecture for spoken dialogue interfaces to semi-autonomous systems that transforms speech signals through successive representations of linguistic, dialogue, and domain knowledge. Each step produces an output, and a meta-output describing the transformation, with an executable program in a simple scripting language as the final result. The output/meta-output distinction permits perspicuous treatment of diverse tasks such as resolving pronouns, correcting user misconceptions, and optimizing scripts.

* Language Technology Joint Conference ANLP-NAACL 2000. 29 April - 4 May 2000, Seattle, WA 

  Access Paper or Ask Questions

Practical experiments with regular approximation of context-free languages

Oct 25, 1999
Mark-Jan Nederhof

Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.

* 28 pages. To appear in Computational Linguistics 26(1), March 2000 

  Access Paper or Ask Questions

Transducers from Rewrite Rules with Backreferences

Apr 15, 1999
Dale Gerdemann, Gertjan van Noord

Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.

* 8 pages, EACL 1999 Bergen 

  Access Paper or Ask Questions

Robust stochastic parsing using the inside-outside algorithm

Dec 19, 1994
Briscoe, Ted, Waegner, Nick

The paper describes a parser of sequences of (English) part-of-speech labels which utilises a probabilistic grammar trained using the inside-outside algorithm. The initial (meta)grammar is defined by a linguist and further rules compatible with metagrammatical constraints are automatically generated. During training, rules with very low probability are rejected yielding a wide-coverage parser capable of ranking alternative analyses. A series of corpus-based experiments describe the parser's performance.

* Revised and updated version of paper from AAAI Workshop on Probabilistically-based Natural Language Processing Techniques, 1992, 16 pages, uuencoded, compressed postscript 

  Access Paper or Ask Questions

Use of Machine Learning Technique to maximize the signal over background for $H \rightarrow ττ$

Jul 07, 2021
Kanhaiya Gupta

In recent years, artificial neural networks (ANNs) have won numerous contests in pattern recognition and machine learning. ANNS have been applied to problems ranging from speech recognition to prediction of protein secondary structure, classification of cancers, and gene prediction. Here, we intend to maximize the chances of finding the Higgs boson decays to two $\tau$ leptons in the pseudo dataset using a Machine Learning technique to classify the recorded events as signal or background.

* 9 pages, 14 figures 

  Access Paper or Ask Questions