Picture for Mohamed Elhoseiny

Mohamed Elhoseiny

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Add code
Mar 19, 2026
Viaarxiv icon

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Add code
Mar 04, 2026
Viaarxiv icon

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Add code
Feb 25, 2026
Viaarxiv icon

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

Add code
Jan 26, 2026
Viaarxiv icon

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Add code
Dec 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Add code
May 30, 2025
Viaarxiv icon

VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models

Add code
May 08, 2025
Viaarxiv icon

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Add code
Apr 22, 2025
Viaarxiv icon