Picture for Ahmed Hussen Abdelaziz

Ahmed Hussen Abdelaziz

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

Add code
Jun 17, 2025
Viaarxiv icon

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Add code
Sep 16, 2024
Figure 1 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 2 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 3 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Figure 4 for Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Viaarxiv icon

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Add code
Sep 16, 2024
Figure 1 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 2 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 3 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 4 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Viaarxiv icon

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Add code
Jun 13, 2024
Figure 1 for Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Figure 2 for Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Figure 3 for Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Figure 4 for Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Viaarxiv icon

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Add code
Jun 12, 2024
Figure 1 for Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Figure 2 for Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Figure 3 for Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Figure 4 for Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Viaarxiv icon

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Add code
Feb 01, 2024
Figure 1 for Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Figure 2 for Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Figure 3 for Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Figure 4 for Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Viaarxiv icon

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Add code
Jan 30, 2024
Figure 1 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 2 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 3 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Figure 4 for ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Viaarxiv icon

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Add code
Oct 23, 2023
Viaarxiv icon

Audiovisual Speech Synthesis using Tacotron2

Add code
Aug 03, 2020
Figure 1 for Audiovisual Speech Synthesis using Tacotron2
Figure 2 for Audiovisual Speech Synthesis using Tacotron2
Figure 3 for Audiovisual Speech Synthesis using Tacotron2
Figure 4 for Audiovisual Speech Synthesis using Tacotron2
Viaarxiv icon

Modality Dropout for Improved Performance-driven Talking Faces

Add code
May 27, 2020
Figure 1 for Modality Dropout for Improved Performance-driven Talking Faces
Figure 2 for Modality Dropout for Improved Performance-driven Talking Faces
Figure 3 for Modality Dropout for Improved Performance-driven Talking Faces
Figure 4 for Modality Dropout for Improved Performance-driven Talking Faces
Viaarxiv icon