Picture for Mohamed Elhoseiny

Mohamed Elhoseiny

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Add code
Apr 13, 2026
Viaarxiv icon

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Add code
Apr 09, 2026
Viaarxiv icon

M-MiniGPT4: Multilingual VLLM Alignment via Translated Data

Add code
Mar 31, 2026
Viaarxiv icon

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Add code
Mar 19, 2026
Viaarxiv icon

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Add code
Mar 04, 2026
Viaarxiv icon

From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

Add code
Feb 25, 2026
Viaarxiv icon

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

Add code
Jan 26, 2026
Viaarxiv icon

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Add code
Dec 16, 2025
Viaarxiv icon

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality

Add code
Jul 09, 2025
Figure 1 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Figure 2 for A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Viaarxiv icon

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon