Picture for Abdelrahman Mohamed

Abdelrahman Mohamed

LLMs Can Compensate for Deficiencies in Visual Representations

Add code
Jun 05, 2025
Viaarxiv icon

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation

Add code
May 26, 2025
Viaarxiv icon

JEEM: Vision-Language Understanding in Four Arabic Dialects

Add code
Mar 27, 2025
Viaarxiv icon

Casablanca: Data and Models for Multidialectal Arabic Speech Recognition

Add code
Oct 06, 2024
Figure 1 for Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Figure 2 for Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Figure 3 for Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Figure 4 for Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Viaarxiv icon

fCOP: Focal Length Estimation from Category-level Object Priors

Add code
Sep 29, 2024
Figure 1 for fCOP: Focal Length Estimation from Category-level Object Priors
Figure 2 for fCOP: Focal Length Estimation from Category-level Object Priors
Figure 3 for fCOP: Focal Length Estimation from Category-level Object Priors
Figure 4 for fCOP: Focal Length Estimation from Category-level Object Priors
Viaarxiv icon

A Large-Scale Evaluation of Speech Foundation Models

Add code
Apr 15, 2024
Viaarxiv icon

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Add code
Mar 25, 2024
Viaarxiv icon

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Add code
Mar 01, 2024
Figure 1 for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Figure 2 for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Figure 3 for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Figure 4 for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Viaarxiv icon

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

Add code
Jan 24, 2024
Figure 1 for SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
Figure 2 for SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
Figure 3 for SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
Viaarxiv icon

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

Add code
Nov 15, 2023
Viaarxiv icon