Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Embodiment of Learning in Electro-Optical Signal Processors

Oct 27, 2016
Michiel Hermans, Piotr Antonik, Marc Haelterman, Serge Massar

Delay-coupled electro-optical systems have received much attention for their dynamical properties and their potential use in signal processing. In particular it has recently been demonstrated, using the artificial intelligence algorithm known as reservoir computing, that photonic implementations of such systems solve complex tasks such as speech recognition. Here we show how the backpropagation algorithm can be physically implemented on the same electro-optical delay-coupled architecture used for computation with only minor changes to the original design. We find that, compared when the backpropagation algorithm is not used, the error rate of the resulting computing device, evaluated on three benchmark tasks, decreases considerably. This demonstrates that electro-optical analog computers can embody a large part of their own training process, allowing them to be applied to new, more difficult tasks.

* Physical Review Letters 117, 128301 (2016) 
* Main text (5 pages, 2 figures) merged with the supplementary material (8 pages, 5 figures) 

  Access Paper or Ask Questions

TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

Aug 08, 2016
Seppo Enarvi, Mikko Kurimo

We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text. The tool has been written using Python library Theano, which allows researcher to easily extend it and tune any aspect of the training process. Regardless of the flexibility, Theano is able to generate extremely fast native code that can utilize a GPU or multiple CPU cores in order to parallelize the heavy numerical computations. The tool has been evaluated in difficult Finnish and English conversational speech recognition tasks, and significant improvement was obtained over our best back-off n-gram models. The results that we obtained in the Finnish task were compared to those from existing RNNLM and RWTHLM toolkits, and found to be as good or better, while training times were an order of magnitude shorter.

* Proc. Interspeech 2016, pp. 3052-3056 

  Access Paper or Ask Questions

Keyphrase Extraction using Sequential Labeling

Aug 03, 2016
Sujatha Das Gollapalli, Xiao-li Li

Keyphrases efficiently summarize a document's content and are used in various document processing and retrieval tasks. Several unsupervised techniques and classifiers exist for extracting keyphrases from text documents. Most of these methods operate at a phrase-level and rely on part-of-speech (POS) filters for candidate phrase generation. In addition, they do not directly handle keyphrases of varying lengths. We overcome these modeling shortcomings by addressing keyphrase extraction as a sequential labeling task in this paper. We explore a basic set of features commonly used in NLP tasks as well as predictions from various unsupervised methods to train our taggers. In addition to a more natural modeling for the keyphrase extraction problem, we show that tagging models yield significant performance benefits over existing state-of-the-art extraction methods.

* 10 pages including 2 pages of references, 6 figures 

  Access Paper or Ask Questions

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

Apr 17, 2015
Andrew J. R. Simpson, Gerard Roma, Mark D. Plumbley

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications.

  Access Paper or Ask Questions

Domain adaptation for sequence labeling using hidden Markov models

Dec 14, 2013
Edouard Grave, Guillaume Obozinski, Francis Bach

Most natural language processing systems based on machine learning are not robust to domain shift. For example, a state-of-the-art syntactic dependency parser trained on Wall Street Journal sentences has an absolute drop in performance of more than ten points when tested on textual data from the Web. An efficient solution to make these methods more robust to domain shift is to first learn a word representation using large amounts of unlabeled data from both domains, and then use this representation as features in a supervised learning algorithm. In this paper, we propose to use hidden Markov models to learn word representations for part-of-speech tagging. In particular, we study the influence of using data from the source, the target or both domains to learn the representation and the different ways to represent words using an HMM.

* New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks (NIPS Workshop) (2013) 

  Access Paper or Ask Questions

Linguistic complexity: English vs. Polish, text vs. corpus

Jul 06, 2010
Jaroslaw Kwapien, Stanislaw Drozdz, Adam Orczyk

We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.

* Acta Phys. Pol. A 117, 716-720 (2010) 

  Access Paper or Ask Questions

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

Dec 08, 2002
Peter D. Turney

This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithm's keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsoft's Word 97, (2) an algorithm based on Eric Brill's part-of-speech tagger, (3) the Summarize feature in Verity's Search 97, and (4) NRC's Extractor algorithm. For all five document collections, NRC's Extractor yields the best match with the manually generated keyphrases.

* 31 pages, issued 1997 

  Access Paper or Ask Questions

Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

Aug 26, 1998
Claire Cardie, David Pierce

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank'' corpus; then the grammar is improved by selecting rules with high ``benefit'' scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.

* Proceedings of COLING-ACL'98, pages 218-224. 
* 7 pages; 2 eps figures; uses epsf, colacl 

  Access Paper or Ask Questions

Dialogos: a Robust System for Human-Machine Spoken Dialogue on the Telephone

Dec 20, 1996
Dario Albesano, Paolo Baggia, Morena Danieli, Roberto Gemello, Elisabetta Gerbino, Claudio Rullent

This paper presents Dialogos, a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions both to users which get good recognition performance and to the ones which get lower scores. The robust behavior of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows to deal with partial or total breakdowns of the different levels of analysis. We report the field trial data of the system and the evaluation results of the overall system and of the submodules.

* 4 pages, LaTeX, 1 eps figures, uses icassp91.sty, and psfig.tex; to appear in Proc. of ICASSP 1997, Munich, Germany 

  Access Paper or Ask Questions

Report of the Study Group on Assessment and Evaluation

Jan 18, 1996
Richard Crouch, Robert Gaizauskas, Klaus Netter

This is an interim report discussing possible guidelines for the assessment and evaluation of projects developing speech and language systems. It was prepared at the request of the European Commission DG XIII by an ad hoc study group, and is now being made available in the form in which it was submitted to the Commission. However, the report is not an official European Commission document, and does not reflect European Commission policy, official or otherwise. After a discussion of terminology, the report focusses on combining user-centred and technology-centred assessment, and on how meaningful comparisons can be made of a variety of systems performing different tasks for different domains. The report outlines the kind of infra-structure that might be required to support comparative assessment and evaluation of heterogenous projects, and also the results of a questionnaire concerning different approaches to evaluation.

* 83 pages 

  Access Paper or Ask Questions