Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"speech": models, code, and papers

A mathematical model of the vowel space

Nov 02, 2021
Frédéric Berthommier

The articulatory-acoustic relationship is many-to-one and non linear and this is a great limitation for studying speech production. A simplification is proposed to set a bijection between the vowel space (f1, f2) and the parametric space of different vocal tract models. The generic area function model is based on mixtures of cosines allowing the generation of main vowels with two formulas. Then the mixture function is transformed into a coordination function able to deal with articulatory parameters. This is shown that the coordination function acts similarly with the Fant's model and with the 4-Tube DRM derived from the generic model.

  Access Paper or Ask Questions

Automated Testing of AI Models

Oct 07, 2021
Swagatam Haldar, Deepak Vijaykeerthy, Diptikalyan Saha

The last decade has seen tremendous progress in AI technology and applications. With such widespread adoption, ensuring the reliability of the AI models is crucial. In past, we took the first step of creating a testing framework called AITEST for metamorphic properties such as fairness, robustness properties for tabular, time-series, and text classification models. In this paper, we extend the capability of the AITEST tool to include the testing techniques for Image and Speech-to-text models along with interpretability testing for tabular models. These novel extensions make AITEST a comprehensive framework for testing AI models.

* 5 pages, 3 Figures, 4 Tables 

  Access Paper or Ask Questions

Comparing Acoustic-based Approaches for Alzheimer's Disease Detection

Jun 03, 2021
Aparna Balagopalan, Jekaterina Novikova

In this paper, we study the performance and generalizability of three approaches for AD detection from speech on the recent ADReSSo challenge dataset: 1) using conventional acoustic features 2) using novel pre-trained acoustic embeddings 3) combining acoustic features and embeddings. We find that while feature-based approaches have a higher precision, classification approaches relying on the combination of embeddings and features prove to have a higher, and more balanced performance across multiple metrics of performance. Our best model, using such a combined approach, outperforms the acoustic baseline in the challenge by 2.8\%.

* Accepted to INTERSPEECH 2021 

  Access Paper or Ask Questions

Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification

Oct 22, 2020
Badr M. Abdullah, Jacek Kudera, Tania Avgustinova, Bernd Möbius, Dietrich Klakow

Deep neural networks have been employed for various spoken language recognition tasks, including tasks that are multilingual by definition such as spoken language identification. In this paper, we present a neural model for Slavic language identification in speech signals and analyze its emergent representations to investigate whether they reflect objective measures of language relatedness and/or non-linguists' perception of language similarity. While our analysis shows that the language representation space indeed captures language relatedness to a great extent, we find perceptual confusability between languages in our study to be the best predictor of the language representation similarity.

* Accepted in VarDial 2020 Workshop 

  Access Paper or Ask Questions

Parsing Early Modern English for Linguistic Search

Feb 24, 2020
Seth Kulick, Neville Ryant

We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.

  Access Paper or Ask Questions

A Comparison of Techniques for Sentiment Classification of Film Reviews

May 12, 2019
Milan Gritta

We undertake the task of comparing lexicon-based sentiment classification of film reviews with machine learning approaches. We look at existing methodologies and attempt to emulate and improve on them using a 'given' lexicon and a bag-of-words approach. We also utilise syntactical information such as part-of-speech and dependency relations. We will show that a simple lexicon-based classification achieves good results however machine learning techniques prove to be the superior tool. We also show that more features do not necessarily deliver better performance as well as elaborate on three further enhancements not tested in this article.

* A short paper from my MPhil in Advanced Computer Science (2014-15) 

  Access Paper or Ask Questions

Meta Learning for Few-shot Keyword Spotting

Dec 26, 2018
Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li

Keyword spotting with limited training data is a challenging task which can be treated as a few-shot learning problem. In this paper, we present a meta-learning approach which learns a good initialization of the base KWS model from existed labeled dataset. Then it can quickly adapt to new tasks of keyword spotting with only a few labeled data. Furthermore, to strengthen the ability of distinguishing the keywords with the others, we incorporate the negative class as external knowledge to the meta-training process, which proves to be effective. Experiments on the Google Speech Commands dataset show that our proposed approach outperforms the baselines.

  Access Paper or Ask Questions

What can we learn from Semantic Tagging?

Aug 29, 2018
Mostafa Abdou, Artur Kulmizev, Vinit Ravishankar, Lasha Abzianidze, Johan Bos

We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. We employ semantic tagging as an auxiliary task for three different NLP tasks: part-of-speech tagging, Universal Dependency parsing, and Natural Language Inference. We compare full neural network sharing, partial neural network sharing, and what we term the learning what to share setting where negative transfer between tasks is less likely. Our findings show considerable improvements for all tasks, particularly in the learning what to share setting, which shows consistent gains across all tasks.

* 9 pages with references and appendixes. EMNLP 2018 camera ready 

  Access Paper or Ask Questions

Classifier Ensembles for Dialect and Language Variety Identification

Aug 14, 2018
Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018. We present a system developed to discriminate between Flemish and Dutch in subtitles and a system trained to discriminate between four Arabic dialects: Egyptian, Levantine, Gulf, North African, and Modern Standard Arabic in speech broadcasts. Finally, we compare the performance of these two systems with the other systems submitted to the Discriminating between Dutch and Flemish in Subtitles (DFS) and the Arabic Dialect Identification (ADI) shared tasks at VarDial 2018.

  Access Paper or Ask Questions

Characterizing the Language of Online Communities and its Relation to Community Reception

Sep 15, 2016
Trang Tran, Mari Ostendorf

This work investigates style and topic aspects of language in online communities: looking at both utility as an identifier of the community and correlation with community reception of content. Style is characterized using a hybrid word and part-of-speech tag n-gram language model, while topic is represented using Latent Dirichlet Allocation. Experiments with several Reddit forums show that style is a better indicator of community identity than topic, even for communities organized around specific topics. Further, there is a positive correlation between the community reception to a contribution and the style similarity to that community, but not so for topic similarity.

* EMNLP 2016 

  Access Paper or Ask Questions