Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sarina Meyer

Use Cases for Voice Anonymization

Aug 08, 2025

Sarina Meyer, Ngoc Thang Vu

Abstract:The performance of a voice anonymization system is typically measured according to its ability to hide the speaker's identity and keep the data's utility for downstream tasks. This means that the requirements the anonymization should fulfill depend on the context in which it is used and may differ greatly between use cases. However, these use cases are rarely specified in research papers. In this paper, we study the implications of use case-specific requirements on the design of voice anonymization methods. We perform an extensive literature analysis and user study to collect possible use cases and to understand the expectations of the general public towards such tools. Based on these studies, we propose the first taxonomy of use cases for voice anonymization, and derive a set of requirements and design criteria for method development and evaluation. Using this scheme, we propose to focus more on use case-oriented research and development of voice anonymization systems.

* Accepted at SPSC 2025 - 5th Symposium on Security and Privacy in Speech Communication

Via

Access Paper or Ask Questions

The Risks and Detection of Overestimated Privacy Protection in Voice Anonymisation

Jul 30, 2025

Michele Panariello, Sarina Meyer, Pierre Champion, Xiaoxiao Miao, Massimiliano Todisco, Ngoc Thang Vu, Nicholas Evans

Abstract:Voice anonymisation aims to conceal the voice identity of speakers in speech recordings. Privacy protection is usually estimated from the difficulty of using a speaker verification system to re-identify the speaker post-anonymisation. Performance assessments are therefore dependent on the verification model as well as the anonymisation system. There is hence potential for privacy protection to be overestimated when the verification system is poorly trained, perhaps with mismatched data. In this paper, we demonstrate the insidious risk of overestimating anonymisation performance and show examples of exaggerated performance reported in the literature. For the worst case we identified, performance is overestimated by 74% relative. We then introduce a means to detect when performance assessment might be untrustworthy and show that it can identify all overestimation scenarios presented in the paper. Our solution is openly available as a fork of the 2024 VoicePrivacy Challenge evaluation toolkit.

* Accepted at SPSC 2025 - 5th Symposium on Security and Privacy in Speech Communication

Via

Access Paper or Ask Questions

First Steps Towards Voice Anonymization for Code-Switching Speech

Jul 02, 2025

Sarina Meyer, Ekaterina Kolos, Ngoc Thang Vu

Abstract:The goal of voice anonymization is to modify an audio such that the true identity of its speaker is hidden. Research on this task is typically limited to the same English read speech datasets, thus the efficacy of current methods for other types of speech data remains unknown. In this paper, we present the first investigation of voice anonymization for the multilingual phenomenon of code-switching speech. We prepare two corpora for this task and propose adaptations to a multilingual anonymization model to make it applicable for code-switching speech. By testing the anonymization performance of this and two language-independent methods on the datasets, we find that only the multilingual system performs well in terms of privacy and utility preservation. Furthermore, we observe challenges in performing utility evaluations on this data because of its spontaneous character and the limited code-switching support by the multilingual speech recognition model.

* accepted at Interspeech 2025

Via

Access Paper or Ask Questions

Probing the Feasibility of Multilingual Speaker Anonymization

Jul 03, 2024

Sarina Meyer, Florian Lux, Ngoc Thang Vu

Figure 1 for Probing the Feasibility of Multilingual Speaker Anonymization

Figure 2 for Probing the Feasibility of Multilingual Speaker Anonymization

Figure 3 for Probing the Feasibility of Multilingual Speaker Anonymization

Figure 4 for Probing the Feasibility of Multilingual Speaker Anonymization

Abstract:In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden. While this technology could help to protect the privacy of individuals around the globe, current research restricts this by focusing almost exclusively on English data. In this study, we extend a state-of-the-art anonymization system to nine languages by transforming language-dependent components to their multilingual counterparts. Experiments testing the robustness of the anonymized speech against privacy attacks and speech deterioration show an overall success of this system for all languages. The results suggest that speaker embeddings trained on English data can be applied across languages, and that the anonymization performance for a language is mainly affected by the quality of the speech synthesis component used for it.

* accepted at Interspeech 2024

Via

Access Paper or Ask Questions

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Jun 10, 2024

Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu

Abstract:In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

* accepted at Interspeech 2024

Via

Access Paper or Ask Questions

The VoicePrivacy 2024 Challenge Evaluation Plan

Apr 03, 2024

Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

Figure 1 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 2 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 3 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 4 for The VoicePrivacy 2024 Challenge Evaluation Plan

Abstract:The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Participants apply their developed anonymization systems, run evaluation scripts and submit evaluation results and anonymized speech data to the organizers. Results will be presented at a workshop held in conjunction with Interspeech 2024 to which all participants are invited to present their challenge systems and to submit additional workshop papers.

* arXiv admin note: substantial text overlap with arXiv:2203.12468

Via

Access Paper or Ask Questions

The IMS Toucan System for the Blizzard Challenge 2023

Oct 26, 2023

Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

Figure 1 for The IMS Toucan System for the Blizzard Challenge 2023

Figure 2 for The IMS Toucan System for the Blizzard Challenge 2023

Figure 3 for The IMS Toucan System for the Blizzard Challenge 2023

Figure 4 for The IMS Toucan System for the Blizzard Challenge 2023

Abstract:For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available.

* Published at the Blizzard Challenge Workshop 2023, colocated with the Speech Synthesis Workshop 2023, a sattelite event of the Interspeech 2023

Via

Access Paper or Ask Questions

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Oct 26, 2023

Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

Figure 1 for Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Figure 2 for Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Figure 3 for Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Figure 4 for Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Abstract:Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.

* Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

Via

Access Paper or Ask Questions

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Sep 14, 2023

Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu

Figure 1 for VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Figure 2 for VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Figure 3 for VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Figure 4 for VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Abstract:Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which make the evaluation more powerful and reduces their computation time by 65 to 95\%, depending on the metric. Our code is fully open source.

* Submitted to OJSP-ICASSP 2024

Via

Access Paper or Ask Questions

Modeling Speaker-Listener Interaction for Backchannel Prediction

Apr 10, 2023

Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

Figure 1 for Modeling Speaker-Listener Interaction for Backchannel Prediction

Figure 2 for Modeling Speaker-Listener Interaction for Backchannel Prediction

Figure 3 for Modeling Speaker-Listener Interaction for Backchannel Prediction

Figure 4 for Modeling Speaker-Listener Interaction for Backchannel Prediction

Abstract:We present our latest findings on backchannel modeling novelly motivated by the canonical use of the minimal responses Yeah and Uh-huh in English and their correspondent tokens in German, and the effect of encoding the speaker-listener interaction. Backchanneling theories emphasize the active and continuous role of the listener in the course of the conversation, their effects on the speaker's subsequent talk, and the consequent dynamic speaker-listener interaction. Therefore, we propose a neural-based acoustic backchannel classifier on minimal responses by processing acoustic features from the speaker speech, capturing and imitating listeners' backchanneling behavior, and encoding speaker-listener interaction. Our experimental results on the Switchboard and GECO datasets reveal that in almost all tested scenarios the speaker or listener behavior embeddings help the model make more accurate backchannel predictions. More importantly, a proper interaction encoding strategy, i.e., combining the speaker and listener embeddings, leads to the best performance on both datasets in terms of F1-score.

* Published in IWSDS 2023

Via

Access Paper or Ask Questions