Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

ReINTEL Challenge 2020: Exploiting Transfer Learning Modelsfor Reliable Intelligence Identification on Vietnamese Social Network Sites

Feb 23, 2021
Kim Thi-Thanh Nguyen, Kiet Van Nguyen

This paper presents the system that we propose for the Reliable Intelligence Indentification on Vietnamese Social Network Sites (ReINTEL) task of the Vietnamese Language and Speech Processing 2020 (VLSP 2020) Shared Task. In this task, the VLSP 2020 provides a dataset with approximately 6,000 trainning news/posts annotated with reliable or unreliable labels, and a test set consists of 2,000 examples without labels. In this paper, we conduct experiments on different transfer learning models, which are bert4news and PhoBERT fine-tuned to predict whether the news is reliable or not. In our experiments, we achieve the AUC score of 94.52% on the private test set from ReINTEL's organizers.

* 4 pages, ReINTEL Task at VLSP 2020 Workshop 

  Access Paper or Ask Questions

MIDAS at SemEval-2020 Task 10: Emphasis Selection using Label Distribution Learning and Contextual Embeddings

Sep 06, 2020
Sarthak Anand, Pradyumna Gupta, Hemant Yadav, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah

This paper presents our submission to the SemEval 2020 - Task 10 on emphasis selection in written text. We approach this emphasis selection problem as a sequence labeling task where we represent the underlying text with various contextual embedding models. We also employ label distribution learning to account for annotator disagreements. We experiment with the choice of model architectures, trainability of layers, and different contextual embeddings. Our best performing architecture is an ensemble of different models, which achieved an overall matching score of 0.783, placing us 15th out of 31 participating teams. Lastly, we analyze the results in terms of parts of speech tags, sentence lengths, and word ordering.

  Access Paper or Ask Questions

Arabic Offensive Language on Twitter: Analysis and Experiments

Apr 05, 2020
Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building effective Arabic offensive tweet detection. We introduce a method for building an offensive dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. Next, we analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct a large battery of experiments to produce strong results (F1 = 79.7) on the dataset using Support Vector Machine techniques.

* 10 pages, 6 figures, 3 tables 

  Access Paper or Ask Questions

A Joint Approach to Compound Splitting and Idiomatic Compound Detection

Mar 21, 2020
Irina Krotova, Sergey Aksenov, Ekaterina Artemova

Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out-of-vocabulary (OOV) words. In-depth processing of noun compounds requires not only splitting them into smaller components (or even roots) but also the identification of instances that should remain unsplitted as they are of idiomatic nature. We develop a two-fold deep learning-based approach of noun compound splitting and idiomatic compound detection for the German language that we train using a newly collected corpus of annotated German compounds. Our neural noun compound splitter operates on a sub-word level and outperforms the current state of the art by about 5%.

* 8 pages, 5 tables, 1 figure, accepted at LREC 2020 

  Access Paper or Ask Questions

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

Mar 20, 2020
Yoonjae Jeong, Hoon-Young Cho

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change. The proposed method, therefore, uses the recognition ranking of each phoneme segment corresponding to a phoneme sequence for measuring the confidence of a voice-over utterance for its corresponding script. The experimental results show that the proposed UV method outperforms a state-of-the-art approach using cross modal attention used for detecting mismatch between speech and transcription.

* Accepted by ICASSP 2020 

  Access Paper or Ask Questions

Hybrid Autoregressive Transducer (hat)

Mar 12, 2020
Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. This article also presents a finite context version of the HAT model that addresses the exposure bias problem and significantly simplifies the overall training and inference. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches.

  Access Paper or Ask Questions

Learning To Detect Keyword Parts And Whole By Smoothed Max Pooling

Jan 25, 2020
Hyun-Jin Park, Patrick Violette, Niranjan Subrahmanya

We propose smoothed max pooling loss and its application to keyword spotting systems. The proposed approach jointly trains an encoder (to detect keyword parts) and a decoder (to detect whole keyword) in a semi-supervised manner. The proposed new loss function allows training a model to detect parts and whole of a keyword, without strictly depending on frame-level labeling from LVCSR (Large vocabulary continuous speech recognition), making further optimization possible. The proposed system outperforms the baseline keyword spotting model in [1] due to increased optimizability. Further, it can be more easily adapted for on-device learning applications due to reduced dependency on LVCSR.

* Accepted in International Conference on Acoustics, Speech, and Signal Processing 2020 

  Access Paper or Ask Questions

Acoustic Model Adaptation from Raw Waveforms with SincNet

Sep 30, 2019
Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed to reduce the number of parameters required in raw-waveform modelling, by restricting the filter functions, rather than having to learn every tap of each filter. We study the adaptation of the SincNet filter parameters from adults' to children's speech, and show that the parameterisation of the SincNet layer is well suited for adaptation in practice: we can efficiently adapt with a very small number of parameters, producing error rates comparable to techniques using orders of magnitude more parameters.

* Accepted to IEEE ASRU 2019 

  Access Paper or Ask Questions

Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue

Sep 02, 2019
Mingzhi Yu, Emer Gilmartin, Diane Litman

Research on human spoken language has shown that speech plays an important role in identifying speaker personality traits. In this work, we propose an approach for identifying speaker personality traits using overlap dynamics in multiparty spoken dialogues. We first define a set of novel features representing the overlap dynamics of each speaker. We then investigate the impact of speaker personality traits on these features using ANOVA tests. We find that features of overlap dynamics significantly vary for speakers with different levels of both Extraversion and Conscientiousness. Finally, we find that classifiers using only overlap dynamics features outperform random guessing in identifying Extraversion and Agreeableness, and that the improvements are statistically significant.

* Proceedings Interspeech 2019, Graz, Austria, September 

  Access Paper or Ask Questions

A survey on Adversarial Attacks and Defenses in Text

Feb 12, 2019
Wenqi Wang, Benxiao Tang, Run Wang, Lina Wang, Aoshuang Ye

Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. To the best of our knowledge, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect.

  Access Paper or Ask Questions