Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling

Jun 15, 2017
Diego Marcheggiani, Anton Frolov, Ivan Titov

We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.

* To appear in CoNLL 2017 

  Access Paper or Ask Questions

Blending LSTMs into CNNs

Sep 14, 2016
Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton

We consider whether deep convolutional networks (CNNs) can represent decision functions with similar accuracy as recurrent networks such as LSTMs. First, we show that a deep CNN with an architecture inspired by the models recently introduced in image recognition can yield better accuracy than previous convolutional and LSTM networks on the standard 309h Switchboard automatic speech recognition task. Then we show that even more accurate CNNs can be trained under the guidance of LSTMs using a variant of model compression, which we call model blending because the teacher and student models are similar in complexity but different in inductive bias. Blending further improves the accuracy of our CNN, yielding a computationally efficient model of accuracy higher than any of the other individual models. Examining the effect of "dark knowledge" in this model compression task, we find that less than 1% of the highest probability labels are needed for accurate model compression.

  Access Paper or Ask Questions

Hybrid Collaborative Filtering with Autoencoders

Jul 19, 2016
Florian Strub, Jeremie Mary, Romaric Gaudel

Collaborative Filtering aims at exploiting the feedback of users to provide personalised recommendations. Such algorithms look for latent variables in a large sparse matrix of ratings. They can be enhanced by adding side information to tackle the well-known cold start problem. While Neu-ral Networks have tremendous success in image and speech recognition, they have received less attention in Collaborative Filtering. This is all the more surprising that Neural Networks are able to discover latent variables in large and heterogeneous datasets. In this paper, we introduce a Collaborative Filtering Neural network architecture aka CFN which computes a non-linear Matrix Factorization from sparse rating inputs and side information. We show experimentally on the MovieLens and Douban dataset that CFN outper-forms the state of the art and benefits from side information. We provide an implementation of the algorithm as a reusable plugin for Torch, a popular Neural Network framework.

  Access Paper or Ask Questions

Deep Learning using Linear Support Vector Machines

Feb 21, 2015
Yichuan Tang

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

* Contribution to the ICML 2013 Challenges in Representation Learning Workshop 

  Access Paper or Ask Questions

Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix

Mar 27, 2014
Sapna Soni, Ahmed Imran, Sunil Kumar Kopparapu

Generally audio news broadcast on radio is com- posed of music, commercials, news from correspondents and recorded statements in addition to the actual news read by the newsreader. When news transcripts are available, automatic segmentation of audio news broadcast to time align the audio with the text transcription to build frugal speech corpora is essential. We address the problem of identifying segmentation in the audio news broadcast corresponding to the news read by the newsreader so that they can be mapped to the text transcripts. The existing techniques produce sub-optimal solutions when used to extract newsreader read segments. In this paper, we propose a new technique which is able to identify the acoustic change points reliably using an acoustic Self Similarity Matrix (SSM). We describe the two pass technique in detail and verify its performance on real audio news broadcast of All India Radio for different languages.

* 4 pages, 5 images 

  Access Paper or Ask Questions

Performing Nonlinear Blind Source Separation with Signal Invariants

Apr 03, 2009
David N. Levin

Given a time series of multicomponent measurements x(t), the usual objective of nonlinear blind source separation (BSS) is to find a "source" time series s(t), comprised of statistically independent combinations of the measured components. In this paper, the source time series is required to have a density function in (s,ds/dt)-space that is equal to the product of density functions of individual components. This formulation of the BSS problem has a solution that is unique, up to permutations and component-wise transformations. Separability is shown to impose constraints on certain locally invariant (scalar) functions of x, which are derived from local higher-order correlations of the data's velocity dx/dt. The data are separable if and only if they satisfy these constraints, and, if the constraints are satisfied, the sources can be explicitly constructed from the data. The method is illustrated by using it to separate two speech-like sounds recorded with a single microphone.

* 8 pages, 3 figures 

  Access Paper or Ask Questions

Amélioration des Performances des Systèmes Automatiques de Reconnaissance de la Parole pour la Parole Non Native

Nov 07, 2007
Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean-Paul Haton

In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate.

* Dans TAIMA'07, Traitement et Analyse de l'Information : M\'ethodes et Applications (2007) 

  Access Paper or Ask Questions

The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts

Apr 28, 2022
Nora Hollenstein, Maria Barrett, Marina Björnsdóttir

Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here:

* accepted at LREC 2022 

  Access Paper or Ask Questions

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Apr 14, 2022
Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Tiffany T. W. Mak, Xiaopu Zhou, Nancy Y. Ip, Pascale Fung

Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on

  Access Paper or Ask Questions

tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context

Apr 04, 2022
Nils L. Westhausen, Bernd T. Meyer

This paper introduces a real-time time-domain packet loss concealment (PLC) neural-network (tPLCnet). It efficiently predicts lost frames from a short context buffer in a sequence-to-one (seq2one) fashion. Because of its seq2one structure, a continuous inference of the model is not required since it can be triggered when packet loss is actually detected. It is trained on 64h of open-source speech data and packet-loss traces of real calls provided by the Audio PLC Challenge. The model with the lowest complexity described in this paper reaches a robust PLC performance and consistent improvements over the zero-filling baseline for all metrics. A configuration with higher complexity is submitted to the PLC Challenge and shows a performance increase of 1.07 compared to the zero-filling baseline in terms of PLC-MOS on the blind test set and reaches a competitive 3rd place in the challenge ranking.

* Submitted to Interspeech 2022 

  Access Paper or Ask Questions