Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

Convolutional Neural Network Architectures for Matching Natural Language Sentences

Mar 11, 2015
Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen

Semantic matching is of central importance to many natural language tasks \cite{bordes2014semantic,RetrievalQA}. A successful matching algorithm needs to adequately model the internal structures of language objects and the interaction between them. As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling, but also capture the rich matching patterns at different levels. Our models are rather generic, requiring no prior knowledge on language, and can hence be applied to matching tasks of different nature and in different languages. The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models.

  Access Paper or Ask Questions

HPS: a hierarchical Persian stemming method

Mar 12, 2014
Ayshe Rashidi, Mina Zolfy Lighvan

In this paper, a novel hierarchical Persian stemming approach based on the Part-Of-Speech of the word in a sentence is presented. The implemented stemmer includes hash tables and several deterministic finite automata in its different levels of hierarchy for removing the prefixes and suffixes of the words. We had two intentions in using hash tables in our method. The first one is that the DFA don't support some special words, so hash table can partly solve the addressed problem. the second goal is to speed up the implemented stemmer with omitting the time that deterministic finite automata need. Because of the hierarchical organization, this method is fast and flexible enough. Our experiments on test sets from Hamshahri collection and security news ( show that our method has the average accuracy of 95.37% which is even improved in using the method on a test set with common topics.

* 10 pages, 6 tables, 2 figures, International Journal on Natural Language Computing (IJNLC), International Journal on Natural Language Computing (IJNLC) Vol. 3, No.1, February 2014 

  Access Paper or Ask Questions

Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization

Nov 30, 2013
Po-Yu Chen, Ivan W. Selesnick

Convex optimization with sparsity-promoting convex regularization is a standard approach for estimating sparse signals in noise. In order to promote sparsity more strongly than convex regularization, it is also standard practice to employ non-convex optimization. In this paper, we take a third approach. We utilize a non-convex regularization term chosen such that the total cost function (consisting of data consistency and regularization terms) is convex. Therefore, sparsity is more strongly promoted than in the standard convex formulation, but without sacrificing the attractive aspects of convex optimization (unique minimum, robust algorithms, etc.). We use this idea to improve the recently developed 'overlapping group shrinkage' (OGS) algorithm for the denoising of group-sparse signals. The algorithm is applied to the problem of speech enhancement with favorable results in terms of both SNR and perceptual quality.

* 14 pages, 11 figures 

  Access Paper or Ask Questions

Transformer-based Multimodal Information Fusion for Facial Expression Analysis

Mar 23, 2022
Wei Zhang, Zhimeng Zhang, Feng Qiu, Suzhen Wang, Bowen Ma, Hao Zeng, Rudong An, Yu Ding

Facial expression analysis has been a crucial research problem in the computer vision area. With the recent development of deep learning techniques and large-scale in-the-wild annotated datasets, facial expression analysis is now aimed at challenges in real world settings. In this paper, we introduce our submission to CVPR2022 Competition on Affective Behavior Analysis in-the-wild (ABAW) that defines four competition tasks, including expression classification, action unit detection, valence-arousal estimation, and a multi-task-learning. The available multimodal information consist of spoken words, speech prosody, and visual expression in videos. Our work proposes four unified transformer-based network frameworks to create the fusion of the above multimodal information. The preliminary results on the official Aff-Wild2 dataset are reported and demonstrate the effectiveness of our proposed method.

  Access Paper or Ask Questions

MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation

Jul 24, 2021
Ayush Garg, Sammed S Kagi, Vivek Srivastava, Mayank Singh

Code-mixing is a phenomenon of mixing words and phrases from two or more languages in a single utterance of speech and text. Due to the high linguistic diversity, code-mixing presents several challenges in evaluating standard natural language generation (NLG) tasks. Various widely popular metrics perform poorly with the code-mixed NLG tasks. To address this challenge, we present a metric independent evaluation pipeline MIPE that significantly improves the correlation between evaluation metrics and human judgments on the generated code-mixed text. As a use case, we demonstrate the performance of MIPE on the machine-generated Hinglish (code-mixing of Hindi and English languages) sentences from the HinGE corpus. We can extend the proposed evaluation strategy to other code-mixed language pairs, NLG tasks, and evaluation metrics with minimal to no effort.

  Access Paper or Ask Questions

Weighted Training for Cross-Task Learning

May 28, 2021
Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J. Su

In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (PoS) tagging, chunking, predicate detection, and named entity recognition (NER). As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning, such as the choice of the source data and the impact of fine-tuning

* 21 pages, 3 figures, 6 tables 

  Access Paper or Ask Questions

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

May 01, 2021
Grzegorz Chrupała

This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques.

  Access Paper or Ask Questions

ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Mar 01, 2021
Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li

It remains a tough challenge to recover the speech signals contaminated by various noises under real acoustic environments. To this end, we propose a novel system for denoising in the complicated applications, which is mainly comprised of two pipelines, namely a two-stage network and a post-processing module. The first pipeline is proposed to decouple the optimization problem w:r:t: magnitude and phase, i.e., only the magnitude is estimated in the first stage and both of them are further refined in the second stage. The second pipeline aims to further suppress the remaining unnatural distorted noise, which is demonstrated to sufficiently improve the subjective quality. In the ICASSP 2021 Deep Noise Suppression (DNS) Challenge, our submitted system ranked top-1 for the real-time track 1 in terms of Mean Opinion Score (MOS) with ITU-T P.808 framework.

* 5 pages, 3 figures, accepted by ICASSP 2021 

  Access Paper or Ask Questions