Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Massimiliano Todisco

A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness

Apr 27, 2024

Oubaida Chouchane, Christoph Busch, Chiara Galdi, Nicholas Evans, Massimiliano Todisco

Figure 1 for A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness

Figure 2 for A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness

Figure 3 for A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness

Figure 4 for A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness

Abstract:When decisions are made and when personal data is treated by automated processes, there is an expectation of fairness -- that members of different demographic groups receive equitable treatment. This expectation applies to biometric systems such as automatic speaker verification (ASV). We present a comparison of three candidate fairness metrics and extend previous work performed for face recognition, by examining differential performance across a range of different ASV operating points. Results show that the Gini Aggregation Rate for Biometric Equitability (GARBE) is the only one which meets three functional fairness measure criteria. Furthermore, a comprehensive evaluation of the fairness and verification performance of five state-of-the-art ASV systems is also presented. Our findings reveal a nuanced trade-off between fairness and verification accuracy underscoring the complex interplay between system design, demographic inclusiveness, and verification reliability.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

The VoicePrivacy 2024 Challenge Evaluation Plan

Apr 03, 2024

Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

Figure 1 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 2 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 3 for The VoicePrivacy 2024 Challenge Evaluation Plan

Figure 4 for The VoicePrivacy 2024 Challenge Evaluation Plan

Abstract:The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Participants apply their developed anonymization systems, run evaluation scripts and submit evaluation results and anonymized speech data to the organizers. Results will be presented at a workshop held in conjunction with Interspeech 2024 to which all participants are invited to present their challenge systems and to submit additional workshop papers.

* arXiv admin note: substantial text overlap with arXiv:2203.12468

Via

Access Paper or Ask Questions

Speaker anonymization using neural audio codec language models

Sep 28, 2023

Michele Panariello, Francesco Nespoli, Massimiliano Todisco, Nicholas Evans

Figure 1 for Speaker anonymization using neural audio codec language models

Figure 2 for Speaker anonymization using neural audio codec language models

Abstract:The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022.

* Submitted to ICASSP 2024

Via

Access Paper or Ask Questions

Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Sep 18, 2023

Wanying Ge, Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Nicholas Evans

Figure 1 for Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Figure 2 for Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Figure 3 for Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Abstract:A reliable deepfake detector or spoofing countermeasure (CM) should be robust in the face of unpredictable spoofing attacks. To encourage the learning of more generaliseable artefacts, rather than those specific only to known attacks, CMs are usually exposed to a broad variety of different attacks during training. Even so, the performance of deep-learning-based CM solutions are known to vary, sometimes substantially, when they are retrained with different initialisations, hyper-parameters or training data partitions. We show in this paper that the potency of spoofing attacks, also deep-learning-based, can similarly vary according to training conditions, sometimes resulting in substantial degradations to detection performance. Nevertheless, while a RawNet2 CM model is vulnerable when only modest adjustments are made to the attack algorithm, those based upon graph attention networks and self-supervised learning are reassuringly robust. The focus upon training data generated with different attack algorithms might not be sufficient on its own to ensure generaliability; some form of spoofing attack augmentation at the algorithm level can be complementary.

* Submitted to ICASSP 2024

Via

Access Paper or Ask Questions

SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Sep 12, 2023

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

Figure 1 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Figure 2 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Figure 3 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Figure 4 for SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Abstract:The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recognition is no longer accessible from the official website. To mitigate these concerns, this work presents an initiative to generate a privacy-friendly synthetic VoxCeleb2 dataset that ensures the quality of the generated speech in terms of privacy, utility, and fairness. We also discuss the challenges of using synthetic data for the downstream task of speaker verification.

* conference

Via

Access Paper or Ask Questions

Fairness and Privacy in Voice Biometrics:A Study of Gender Influences Using wav2vec 2.0

Aug 27, 2023

Oubaida Chouchane, Michele Panariello, Chiara Galdi, Massimiliano Todisco, Nicholas Evans

Abstract:This study investigates the impact of gender information on utility, privacy, and fairness in voice biometric systems, guided by the General Data Protection Regulation (GDPR) mandates, which underscore the need for minimizing the processing and storage of private and sensitive data, and ensuring fairness in automated decision-making systems. We adopt an approach that involves the fine-tuning of the wav2vec 2.0 model for speaker verification tasks, evaluating potential gender-related privacy vulnerabilities in the process. Gender influences during the fine-tuning process were employed to enhance fairness and privacy in order to emphasise or obscure gender information within the speakers' embeddings. Results from VoxCeleb datasets indicate our adversarial model increases privacy against uninformed attacks, yet slightly diminishes speaker verification performance compared to the non-adversarial model. However, the model's efficacy reduces against informed attacks. Analysis of system performance was conducted to identify potential gender biases, thus highlighting the need for further research to understand and improve the delicate interplay between utility, privacy, and equity in voice biometric systems.

* 7 pages

Via

Access Paper or Ask Questions

Vocoder drift compensation by x-vector alignment in speaker anonymisation

Jul 17, 2023

Michele Panariello, Massimiliano Todisco, Nicholas Evans

Figure 1 for Vocoder drift compensation by x-vector alignment in speaker anonymisation

Figure 2 for Vocoder drift compensation by x-vector alignment in speaker anonymisation

Figure 3 for Vocoder drift compensation by x-vector alignment in speaker anonymisation

Figure 4 for Vocoder drift compensation by x-vector alignment in speaker anonymisation

Abstract:For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for the design of better anonymisation functions in the future.

* Accepted at the ISCA SPSC Symposium 2023

Via

Access Paper or Ask Questions

Differentially Private Adversarial Auto-Encoder to Protect Gender in Voice Biometrics

Jul 05, 2023

Oubaïda Chouchane, Michele Panariello, Oualid Zari, Ismet Kerenciler, Imen Chihaoui, Massimiliano Todisco, Melek Önen

Figure 1 for Differentially Private Adversarial Auto-Encoder to Protect Gender in Voice Biometrics

Figure 2 for Differentially Private Adversarial Auto-Encoder to Protect Gender in Voice Biometrics

Figure 3 for Differentially Private Adversarial Auto-Encoder to Protect Gender in Voice Biometrics

Abstract:Over the last decade, the use of Automatic Speaker Verification (ASV) systems has become increasingly widespread in response to the growing need for secure and efficient identity verification methods. The voice data encompasses a wealth of personal information, which includes but is not limited to gender, age, health condition, stress levels, and geographical and socio-cultural origins. These attributes, known as soft biometrics, are private and the user may wish to keep them confidential. However, with the advancement of machine learning algorithms, soft biometrics can be inferred automatically, creating the potential for unauthorized use. As such, it is crucial to ensure the protection of these personal data that are inherent within the voice while retaining the utility of identity recognition. In this paper, we present an adversarial Auto-Encoder--based approach to hide gender-related information in speaker embeddings, while preserving their effectiveness for speaker verification. We use an adversarial procedure against a gender classifier and incorporate a layer based on the Laplace mechanism into the Auto-Encoder architecture. This layer adds Laplace noise for more robust gender concealment and ensures differential privacy guarantees during inference for the output speaker embeddings. Experiments conducted on the VoxCeleb dataset demonstrate that speaker verification tasks can be effectively carried out while concealing speaker gender and ensuring differential privacy guarantees; moreover, the intensity of the Laplace noise can be tuned to select the desired trade-off between privacy and utility.

* 6 Pages, 3 figures

Via

Access Paper or Ask Questions

Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Jun 13, 2023

Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas Evans

Figure 1 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Figure 2 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Figure 3 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Figure 4 for Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Abstract:We present Malafide, a universal adversarial attack against automatic speaker verification (ASV) spoofing countermeasures (CMs). By introducing convolutional noise using an optimised linear time-invariant filter, Malafide attacks can be used to compromise CM reliability while preserving other speech attributes such as quality and the speaker's voice. In contrast to other adversarial attacks proposed recently, Malafide filters are optimised independently of the input utterance and duration, are tuned instead to the underlying spoofing attack, and require the optimisation of only a small number of filter coefficients. Even so, they degrade CM performance estimates by an order of magnitude, even in black-box settings, and can also be configured to overcome integrated CM and ASV subsystems. Integrated solutions that use self-supervised learning CMs, however, are more robust, under both black-box and white-box settings.

* Accepted at INTERSPEECH 2023

Via

Access Paper or Ask Questions

Vocoder drift in x-vector-based speaker anonymization

Jun 05, 2023

Michele Panariello, Massimiliano Todisco, Nicholas Evans

Figure 1 for Vocoder drift in x-vector-based speaker anonymization

Figure 2 for Vocoder drift in x-vector-based speaker anonymization

Figure 3 for Vocoder drift in x-vector-based speaker anonymization

Abstract:State-of-the-art approaches to speaker anonymization typically employ some form of perturbation function to conceal speaker information contained within an x-vector embedding, then resynthesize utterances in the voice of a new pseudo-speaker using a vocoder. Strategies to improve the x-vector anonymization function have attracted considerable research effort, whereas vocoder impacts are generally neglected. In this paper, we show that the impact of the vocoder is substantial and sometimes dominant. The vocoder drift, namely the difference between the x-vector vocoder input and that which can be extracted subsequently from the output, is learnable and can hence be reversed by an attacker; anonymization can be undone and the level of privacy protection provided by such approaches might be weaker than previously thought. The findings call into question the focus upon x-vector anonymization, prompting the need for greater attention to vocoder impacts and stronger attack models alike.

* Accepted at INTERSPEECH 2023

Via

Access Paper or Ask Questions