Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Concept-Based Embeddings for Natural Language Processing

Jul 15, 2018
Yukun Ma, Erik Cambria

In this work, we focus on effectively leveraging and integrating information from concept-level as well as word-level via projecting concepts and words into a lower dimensional space while retaining most critical semantics. In a broad context of opinion understanding system, we investigate the use of the fused embedding for several core NLP tasks: named entity detection and classification, automatic speech recognition reranking, and targeted sentiment analysis.

  Access Paper or Ask Questions

A Comprehensive Survey on Bengali Phoneme Recognition

Apr 26, 2018
Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam, Marium E Jannat

Hidden Markov model based various phoneme recognition methods for Bengali language is reviewed. Automatic phoneme recognition for Bengali language using multilayer neural network is reviewed. Usefulness of multilayer neural network over single layer neural network is discussed. Bangla phonetic feature table construction and enhancement for Bengali speech recognition is also discussed. Comparison among these methods is discussed.

* 7 pages, reference added in phoneme recognition methods 

  Access Paper or Ask Questions

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

Oct 12, 2017
Dan Lim

This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR. It also describes various neural network algorithms including Batch normalization, Dropout and Residual network which constitute the convolutional attention-based seq2seq neural network. Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset.

* Masters thesis, Korea Univ 

  Access Paper or Ask Questions

Generating Memorable Mnemonic Encodings of Numbers

May 07, 2017
Vincent Fiorentini, Megan Shao, Julie Medero

The major system is a mnemonic system that can be used to memorize sequences of numbers. In this work, we present a method to automatically generate sentences that encode a given number. We propose several encoding models and compare the most promising ones in a password memorability study. The results of the study show that a model combining part-of-speech sentence templates with an $n$-gram language model produces the most memorable password representations.

  Access Paper or Ask Questions

Outlier-Robust Convex Segmentation

Nov 18, 2014
Itamar Katz, Koby Crammer

We derive a convex optimization problem for the task of segmenting sequential data, which explicitly treats presence of outliers. We describe two algorithms for solving this problem, one exact and one a top-down novel approach, and we derive a consistency results for the case of two segments and no outliers. Robustness to outliers is evaluated on two real-world tasks related to speech segmentation. Our algorithms outperform baseline segmentation algorithms.

* * Accepted to AAAI-15, this version includes the appendix/supplementary material referenced in the AAAI-15 submission, as well as color figures * This version include some minor typos correction 

  Access Paper or Ask Questions

HMM Specialization with Selective Lexicalization

Dec 23, 1999
Jin-Dong Kim, Sang-Zoo Lee, Hae-Chang Rim

We present a technique which complements Hidden Markov Models by incorporating some lexicalized states representing syntactically uncommon words. Our approach examines the distribution of transitions, selects the uncommon words, and makes lexicalized states for the words. We performed a part-of-speech tagging experiment on the Brown corpus to evaluate the resultant language model and discovered that this technique improved the tagging accuracy by 0.21% at the 95% level of confidence.

* Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp.121-127, 1999 
* 7 pages, 6 figures 

  Access Paper or Ask Questions

Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer

Jun 09, 2020
Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-yi Lee

In this paper, we seek to reduce the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention mechanisms; then, we pre-train the transformer-based model with those attentions in a self-supervised fashion and use them as feature extractors on downstream tasks, including phoneme classification and speaker classification. We find that the proposed approach, which only uses hand-crafted and learnable attentions, is comparable with the full self-attention.

  Access Paper or Ask Questions

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models

May 25, 2018
Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius

We present OpenSeq2Seq -- an open-source toolkit for training sequence-to-sequence models. The main goal of our toolkit is to allow researchers to most effectively explore different sequence-to-sequence architectures. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq provides building blocks for training encoder-decoder models for neural machine translation and automatic speech recognition. We plan to extend it with other modalities in the future.

* to be presented at Workshop for Natural Language Processing Open Source Software (NLP-OSS), co-located with ACL2018 

  Access Paper or Ask Questions

A Freely Available Syntactic Lexicon for English

Oct 21, 1994
Dania Egedi, Patrick Martin

This paper presents a syntactic lexicon for English that was originally derived from the Oxford Advanced Learner's Dictionary and the Oxford Dictionary of Current Idiomatic English, and then modified and augmented by hand. There are more than 37,000 syntactic entries from all 8 parts of speech. An X-windows based tool is available for maintaining the lexicon and performing searches. C and Lisp hooks are also available so that the lexicon can be easily utilized by parsers and other programs.

* Proceedings of the International Workshop on Sharable Natural Language Resources, Nara, Japan, August 1994 
* Latex file with .eps figure. 8 pages 

  Access Paper or Ask Questions

Detecting Hateful Memes Using a Multimodal Deep Ensemble

Dec 24, 2020
Vlad Sandulescu

While significant progress has been made using machine learning algorithms to detect hate speech, important technical challenges still remain to be solved in order to bring their performance closer to human accuracy. We investigate several of the most recent visual-linguistic Transformer architectures and propose improvements to increase their performance for this task. The proposed model outperforms the baselines by a large margin and ranks 5$^{th}$ on the leaderboard out of 3,100+ participants.

* The Hateful Memes Challenge Workshop at NeurIPS 2020 
* 6 pages, NeurIPS 2020, The Hateful Memes Challenge Workshop at NeurIPS 2020 

  Access Paper or Ask Questions