Picture for Mohamed Elhoseiny

Mohamed Elhoseiny

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Add code
Dec 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Add code
May 30, 2025
Viaarxiv icon

VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models

Add code
May 08, 2025
Viaarxiv icon

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Add code
Apr 22, 2025
Viaarxiv icon

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Add code
Apr 12, 2025
Viaarxiv icon

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

Add code
Mar 24, 2025
Viaarxiv icon

3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes

Add code
Jan 12, 2025
Figure 1 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 2 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 3 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 4 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Viaarxiv icon