Picture for Mohamed Elhoseiny

Mohamed Elhoseiny

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Add code
Apr 22, 2025
Viaarxiv icon

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Add code
Apr 12, 2025
Viaarxiv icon

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Add code
Mar 29, 2025
Viaarxiv icon

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

Add code
Mar 24, 2025
Viaarxiv icon

3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes

Add code
Jan 12, 2025
Figure 1 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 2 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 3 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Figure 4 for 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Figure 1 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 2 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 3 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 4 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Viaarxiv icon

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Add code
Nov 23, 2024
Viaarxiv icon

No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages

Add code
Nov 06, 2024
Viaarxiv icon

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

Add code
Oct 29, 2024
Viaarxiv icon

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Add code
Oct 22, 2024
Figure 1 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 2 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 3 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 4 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Viaarxiv icon