Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Neural sequence labeling for Vietnamese POS Tagging and NER

Nov 12, 2018
Duong Nguyen Anh, Hieu Nguyen Kiem, Vi Ngo Van

This paper presents a neural architecture for Vietnamese sequence labeling tasks including part-of-speech (POS) tagging and named entity recognition (NER). We applied the model described in \cite{lample-EtAl:2016:N16-1} that is a combination of bidirectional Long-Short Term Memory and Conditional Random Fields, which rely on two sources of information about words: character-based word representations learned from the supervised corpus and pre-trained word embeddings learned from other unannotated corpora. Experiments on benchmark datasets show that this work achieves state-of-the-art performances on both tasks - 93.52\% accuracy for POS tagging and 94.88\% F1 for NER. Our sourcecode is available at here.

* 5 pages 

  Access Paper or Ask Questions

82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

Sep 06, 2018
Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, Sara Stymne

We present the Uppsala system for the CoNLL 2018 Shared Task on universal dependency parsing. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of- speech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for one language or closely related languages, greatly reducing the number of models. On the official test run, we ranked 7th of 27 teams for the LAS and MLAS metrics. Our system obtained the best scores overall for word segmentation, universal POS tagging, and morphological features.

* Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies 

  Access Paper or Ask Questions

Simplified End-to-End MMI Training and Voting for ASR

Jul 16, 2017
Lior Fritz, David Burshtein

A simplified speech recognition system that uses the maximum mutual information (MMI) criterion is considered. End-to-end training using gradient descent is suggested, similarly to the training of connectionist temporal classification (CTC). We use an MMI criterion with a simple language model in the training stage, and a standard HMM decoder. Our method compares favorably to CTC in terms of performance, robustness, decoding time, disk footprint and quality of alignments. The good alignments enable the use of a straightforward ensemble method, obtained by simply averaging the predictions of several neural network models, that were trained separately end-to-end. The ensemble method yields a considerable reduction in the word error rate.

  Access Paper or Ask Questions

Discovering Sound Concepts and Acoustic Relations In Text

Feb 13, 2017
Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole

In this paper we describe approaches for discovering acoustic concepts and relations in text. The first major goal is to be able to identify text phrases which contain a notion of audibility and can be termed as a sound or an acoustic concept. We also propose a method to define an acoustic scene through a set of sound concepts. We use pattern matching and parts of speech tags to generate sound concepts from large scale text corpora. We use dependency parsing and LSTM recurrent neural network to predict a set of sound concepts for a given acoustic scene. These methods are not only helpful in creating an acoustic knowledge base but in the future can also directly help acoustic event and scene detection research.

* ICASSP 2017 

  Access Paper or Ask Questions

Interpreting the Predictions of Complex ML Models by Layer-wise Relevance Propagation

Nov 24, 2016
Wojciech Samek, Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Klaus-Robert Müller

Complex nonlinear models such as deep neural network (DNNs) have become an important tool for image classification, speech recognition, natural language processing, and many other fields of application. These models however lack transparency due to their complex nonlinear structure and to the complex data distributions to which they typically apply. As a result, it is difficult to fully characterize what makes these models reach a particular decision for a given input. This lack of transparency can be a drawback, especially in the context of sensitive applications such as medical analysis or security. In this short paper, we summarize a recent technique introduced by Bach et al. [1] that explains predictions by decomposing the classification decision of DNN models in terms of input variables.

* Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems 

  Access Paper or Ask Questions

Semi-supervised Learning with Sparse Autoencoders in Phone Classification

Oct 03, 2016
Akash Kumar Dhaka, Giampiero Salvi

We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition based on deep neural net- works. As opposed to unsupervised initialisation followed by supervised fine tuning, our method takes advantage of both unlabelled and labelled data simultaneously through mini- batch stochastic gradient descent. We tested the method with varying proportions of labelled vs unlabelled observations in frame-based phoneme classification on the TIMIT database. Our experiments show that the method outperforms standard supervised training for an equal amount of labelled data and provides competitive error rates compared to state-of-the-art graph-based semi-supervised learning techniques.

* 5 pages, 1 figure, 2 tables 

  Access Paper or Ask Questions

A Short Survey on Data Clustering Algorithms

Nov 25, 2015
Ka-Chun Wong

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end.

  Access Paper or Ask Questions

The Modular Audio Recognition Framework (MARF) and its Applications: Scientific and Software Engineering Notes

Jul 25, 2009
Serguei A. Mokhov, Stephen Sinclair, Ian Clément, Dimitrios Nicolacopoulos, for the MARF R&D Group

MARF is an open-source research platform and a collection of voice/sound/speech/text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework facilitating addition of new algorithms. MARF can run distributively over the network and may act as a library in applications or be used as a source for learning and extension. A few example applications are provided to show how to use the framework. There is an API reference in the Javadoc format as well as this set of accompanying notes with the detailed description of the architectural design, algorithms, and applications. MARF and its applications are released under a BSD-style license and is hosted at This document provides the details and the insight on the internals of MARF and some of the mentioned applications.

* v2: add missing .ind file for index; 224 pages, 40 figures, 19 tables; index. A comprehensive description of AI and PR algorithms and data structures, software engineering design and implementation, and experiments. Source revision is maintained in the CVS at 

  Access Paper or Ask Questions

Exploiting Context When Learning to Classify

Dec 12, 2002
Peter D. Turney

This paper addresses the problem of classifying observations when features are context-sensitive, specifically when the testing set involves a context that is different from the training set. The paper begins with a precise definition of the problem, then general strategies are presented for enhancing the performance of classification algorithms on this type of problem. These strategies are tested on two domains. The first domain is the diagnosis of gas turbine engines. The problem is to diagnose a faulty engine in one context, such as warm weather, when the fault has previously been seen only in another context, such as cold weather. The second domain is speech recognition. The problem is to recognize words spoken by a new speaker, not represented in the training set. For both domains, exploiting context results in substantially more accurate classification.

* Proceedings of the European Conference on Machine Learning, Vienna, Austria, (1993), 402-407 
* 6 pages 

  Access Paper or Ask Questions

Information Extraction from Broadcast News

Mar 30, 2000
Yoshihiko Gotoh, Steve Renals

This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub-4E evaluation for North American Broadcast News.

* 12 pages, 3 figures, Philosophical Transactions of the Royal Society of London, series A: Mathematical, Physical and Engineering Sciences, vol. 358, 2000 

  Access Paper or Ask Questions