Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Supervised and Unsupervised Transfer Learning for Question Answering

Apr 21, 2018
Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied. In this paper, we conduct extensive experiments to investigate the transferability of knowledge learned from a source QA dataset to a target dataset using two QA models. The performance of both models on a TOEFL listening comprehension test (Tseng et al., 2016) and MCTest (Richardson et al., 2013) is significantly improved via a simple transfer learning technique from MovieQA (Tapaswi et al., 2016). In particular, one of the models achieves the state-of-the-art on all target datasets; for the TOEFL listening comprehension test, it outperforms the previous best model by 7%. Finally, we show that transfer learning is helpful even in unsupervised scenarios when correct answers for target QA dataset examples are not available.

* To appear in NAACL HLT 2018 (long paper) 

  Access Paper or Ask Questions

Recent Advances in Neural Program Synthesis

Feb 07, 2018
Neel Kant

In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field.

* 16 pages (without citations); Literature Review 

  Access Paper or Ask Questions

A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling

Jun 15, 2017
Diego Marcheggiani, Anton Frolov, Ivan Titov

We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.

* To appear in CoNLL 2017 

  Access Paper or Ask Questions

Blending LSTMs into CNNs

Sep 14, 2016
Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton

We consider whether deep convolutional networks (CNNs) can represent decision functions with similar accuracy as recurrent networks such as LSTMs. First, we show that a deep CNN with an architecture inspired by the models recently introduced in image recognition can yield better accuracy than previous convolutional and LSTM networks on the standard 309h Switchboard automatic speech recognition task. Then we show that even more accurate CNNs can be trained under the guidance of LSTMs using a variant of model compression, which we call model blending because the teacher and student models are similar in complexity but different in inductive bias. Blending further improves the accuracy of our CNN, yielding a computationally efficient model of accuracy higher than any of the other individual models. Examining the effect of "dark knowledge" in this model compression task, we find that less than 1% of the highest probability labels are needed for accurate model compression.

  Access Paper or Ask Questions

Hybrid Collaborative Filtering with Autoencoders

Jul 19, 2016
Florian Strub, Jeremie Mary, Romaric Gaudel

Collaborative Filtering aims at exploiting the feedback of users to provide personalised recommendations. Such algorithms look for latent variables in a large sparse matrix of ratings. They can be enhanced by adding side information to tackle the well-known cold start problem. While Neu-ral Networks have tremendous success in image and speech recognition, they have received less attention in Collaborative Filtering. This is all the more surprising that Neural Networks are able to discover latent variables in large and heterogeneous datasets. In this paper, we introduce a Collaborative Filtering Neural network architecture aka CFN which computes a non-linear Matrix Factorization from sparse rating inputs and side information. We show experimentally on the MovieLens and Douban dataset that CFN outper-forms the state of the art and benefits from side information. We provide an implementation of the algorithm as a reusable plugin for Torch, a popular Neural Network framework.

  Access Paper or Ask Questions

Deep Learning using Linear Support Vector Machines

Feb 21, 2015
Yichuan Tang

Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

* Contribution to the ICML 2013 Challenges in Representation Learning Workshop 

  Access Paper or Ask Questions

Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix

Mar 27, 2014
Sapna Soni, Ahmed Imran, Sunil Kumar Kopparapu

Generally audio news broadcast on radio is com- posed of music, commercials, news from correspondents and recorded statements in addition to the actual news read by the newsreader. When news transcripts are available, automatic segmentation of audio news broadcast to time align the audio with the text transcription to build frugal speech corpora is essential. We address the problem of identifying segmentation in the audio news broadcast corresponding to the news read by the newsreader so that they can be mapped to the text transcripts. The existing techniques produce sub-optimal solutions when used to extract newsreader read segments. In this paper, we propose a new technique which is able to identify the acoustic change points reliably using an acoustic Self Similarity Matrix (SSM). We describe the two pass technique in detail and verify its performance on real audio news broadcast of All India Radio for different languages.

* 4 pages, 5 images 

  Access Paper or Ask Questions

Performing Nonlinear Blind Source Separation with Signal Invariants

Apr 03, 2009
David N. Levin

Given a time series of multicomponent measurements x(t), the usual objective of nonlinear blind source separation (BSS) is to find a "source" time series s(t), comprised of statistically independent combinations of the measured components. In this paper, the source time series is required to have a density function in (s,ds/dt)-space that is equal to the product of density functions of individual components. This formulation of the BSS problem has a solution that is unique, up to permutations and component-wise transformations. Separability is shown to impose constraints on certain locally invariant (scalar) functions of x, which are derived from local higher-order correlations of the data's velocity dx/dt. The data are separable if and only if they satisfy these constraints, and, if the constraints are satisfied, the sources can be explicitly constructed from the data. The method is illustrated by using it to separate two speech-like sounds recorded with a single microphone.

* 8 pages, 3 figures 

  Access Paper or Ask Questions

Amélioration des Performances des Systèmes Automatiques de Reconnaissance de la Parole pour la Parole Non Native

Nov 07, 2007
Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean-Paul Haton

In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate.

* Dans TAIMA'07, Traitement et Analyse de l'Information : M\'ethodes et Applications (2007) 

  Access Paper or Ask Questions

The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts

Apr 28, 2022
Nora Hollenstein, Maria Barrett, Marina Björnsdóttir

Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here:

* accepted at LREC 2022 

  Access Paper or Ask Questions