Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Natalia Tomashenko

A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Apr 04, 2022

Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

Figure 1 for A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Figure 2 for A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Figure 3 for A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Figure 4 for A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Abstract:Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to understand the impact caused by some features such as gender distribution within pre-training data. Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. The comparison is performed by applying these models to two speech-to-text downstream tasks: ASR and ST. Our results show that the type of downstream integration matters. We observe lower overall performance using gender-specific pre-training before fine-tuning an end-to-end ASR system. However, when self-supervised models are used as feature extractors, the overall ASR and ST results follow more complex patterns, in which the balanced pre-trained model is not necessarily the best option. Lastly, our crude 'fairness' metric, the relative performance difference measured between female and male test sets, does not display a strong variation from balanced to gender-specific pre-trained wav2vec 2.0 models.

* submitted to INTERSPEECH 2022

Via

Access Paper or Ask Questions

The VoicePrivacy 2022 Challenge Evaluation Plan

Mar 27, 2022

Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

Figure 1 for The VoicePrivacy 2022 Challenge Evaluation Plan

Figure 2 for The VoicePrivacy 2022 Challenge Evaluation Plan

Figure 3 for The VoicePrivacy 2022 Challenge Evaluation Plan

Figure 4 for The VoicePrivacy 2022 Challenge Evaluation Plan

Abstract:For new participants - Executive summary: (1) The task is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content, paralinguistic attributes, intelligibility and naturalness. (2) Training, development and evaluation datasets are provided in addition to 3 different baseline anonymization systems, evaluation scripts, and metrics. Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers. (3) Results will be presented at a workshop held in conjunction with INTERSPEECH 2022 to which all participants are invited to present their challenge systems and to submit additional workshop papers. For readers familiar with the VoicePrivacy Challenge - Changes w.r.t. 2020: (1) A stronger, semi-informed attack model in the form of an automatic speaker verification (ASV) system trained on anonymized (per-utterance) speech data. (2) Complementary metrics comprising the equal error rate (EER) as a privacy metric, the word error rate (WER) as a primary utility metric, and the pitch correlation and gain of voice distinctiveness as secondary utility metrics. (3) A new ranking policy based upon a set of minimum target privacy requirements.

* the file is unchanged; minor correction in metadata

Via

Access Paper or Ask Questions

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Mar 26, 2022

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

Figure 1 for Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Figure 2 for Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Figure 3 for Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Figure 4 for Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

Abstract:Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.

Via

Access Paper or Ask Questions

Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Nov 07, 2021

Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

Figure 1 for Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Figure 2 for Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Figure 3 for Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Figure 4 for Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Abstract:The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific speaker data, can be built. A question that naturally arises is whether the dissemination of personalized acoustic models can leak personal information. In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker. Incidentally we observe phenomena that may be useful towards explainability of deep neural networks in the context of speech processing. Gender can be identified almost surely using only the first layers and speaker verification performs well when using middle-up layers. Our experimental study on the TED-LIUM 3 dataset with HMM/TDNN models shows an accuracy of 95% for gender detection, and an Equal Error Rate of 9.07% for a speaker verification task by only exploiting the weights from personalized models that could be exchanged instead of user data.

Via

Access Paper or Ask Questions

Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Nov 06, 2021

Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

Figure 1 for Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Figure 2 for Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Figure 3 for Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Figure 4 for Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Abstract:This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. Using this method, we develop two attack models that aim to infer speaker identity from the updated personalized models without access to the actual users' speech data. Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches are very effective and can provide equal error rate (EER) of 1-2%.

* Submitted to ICASSP 2022

Via

Access Paper or Ask Questions

The VoicePrivacy 2020 Challenge: Results and findings

Sep 01, 2021

Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien(+4 more)

Figure 1 for The VoicePrivacy 2020 Challenge: Results and findings

Figure 2 for The VoicePrivacy 2020 Challenge: Results and findings

Figure 3 for The VoicePrivacy 2020 Challenge: Results and findings

Figure 4 for The VoicePrivacy 2020 Challenge: Results and findings

Abstract:This paper presents the results and analyses stemming from the first VoicePrivacy 2020 Challenge which focuses on developing anonymization solutions for speech technology. We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results. In particular, we describe the voice anonymization task and datasets used for system development and evaluation. Also, we present different attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and provide a summary description of the anonymization systems developed by the challenge participants. We report objective and subjective evaluation results for baseline and submitted systems. In addition, we present experimental results for alternative privacy metrics and attack models developed as a part of the post-evaluation analysis. Finally, we summarize our insights and observations that will influence the design of the next VoicePrivacy challenge edition and some directions for future voice anonymization research.

* Submitted to the Special Issue on Voice Privacy (Computer Speech and Language Journal - Elsevier); under review

Via

Access Paper or Ask Questions

Benchmarking and challenges in security and privacy for voice biometrics

Sep 01, 2021

Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah(+6 more)

Figure 1 for Benchmarking and challenges in security and privacy for voice biometrics

Figure 2 for Benchmarking and challenges in security and privacy for voice biometrics

Figure 3 for Benchmarking and challenges in security and privacy for voice biometrics

Figure 4 for Benchmarking and challenges in security and privacy for voice biometrics

Abstract:For many decades, research in speech technologies has focused upon improving reliability. With this now meeting user expectations for a range of diverse applications, speech technology is today omni-present. As result, a focus on security and privacy has now come to the fore. Here, the research effort is in its relative infancy and progress calls for greater, multidisciplinary collaboration with security, privacy, legal and ethical experts among others. Such collaboration is now underway. To help catalyse the efforts, this paper provides a high-level overview of some related research. It targets the non-speech audience and describes the benchmarking methodology that has spearheaded progress in traditional research and which now drives recent security and privacy initiatives related to voice biometrics. We describe: the ASVspoof challenge relating to the development of spoofing countermeasures; the VoicePrivacy initiative which promotes research in anonymisation for privacy preservation.

* Submitted to the symposium of the ISCA Security & Privacy in Speech Communications (SPSC) special interest group

Via

Access Paper or Ask Questions

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Apr 23, 2021

Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet(+8 more)

Figure 1 for LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Figure 2 for LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Figure 3 for LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Figure 4 for LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Abstract:Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also target speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.

* Submitted to Interspeech 2021

Via

Access Paper or Ask Questions

Speaker anonymisation using the McAdams coefficient

Nov 02, 2020

Jose Patino, Natalia Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans

Figure 1 for Speaker anonymisation using the McAdams coefficient

Figure 2 for Speaker anonymisation using the McAdams coefficient

Figure 3 for Speaker anonymisation using the McAdams coefficient

Figure 4 for Speaker anonymisation using the McAdams coefficient

Abstract:Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness. This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data, is based upon well-known signal processing techniques and is both efficient and effective. The proposed solution uses the McAdams coefficient to transform the spectral envelope of speech signals. Results derived using common VoicePrivacy 2020 databases and protocols show that random, optimised transformations can outperform competing solutions in terms of anonymisation while causing only modest, additional degradations to intelligibility, even in the case of a semi-informed privacy adversary.

* Submitted to ICASSP 2021

Via

Access Paper or Ask Questions

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

May 24, 2020

Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

Figure 1 for ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Figure 2 for ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Figure 3 for ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Figure 4 for ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

Abstract:This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Universit\'e), LIG (Universit\'e Grenoble Alpes), and LIUM (Le Mans Universit\'e). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.

Via

Access Paper or Ask Questions