Picture for Zilong Huang

Zilong Huang

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

Add code
Feb 02, 2026
Viaarxiv icon

ThinkGen: Generalized Thinking for Visual Generation

Add code
Dec 29, 2025
Viaarxiv icon

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

Add code
Dec 19, 2025
Viaarxiv icon

SuperCLIP: CLIP with Simple Classification Supervision

Add code
Dec 16, 2025
Figure 1 for SuperCLIP: CLIP with Simple Classification Supervision
Figure 2 for SuperCLIP: CLIP with Simple Classification Supervision
Figure 3 for SuperCLIP: CLIP with Simple Classification Supervision
Figure 4 for SuperCLIP: CLIP with Simple Classification Supervision
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective

Add code
Sep 26, 2025
Figure 1 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 2 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 3 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Figure 4 for UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Viaarxiv icon

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Add code
Aug 13, 2025
Figure 1 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 2 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 3 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Figure 4 for Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Figure 1 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 2 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 3 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Figure 4 for Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Figure 1 for Seed1.5-VL Technical Report
Figure 2 for Seed1.5-VL Technical Report
Figure 3 for Seed1.5-VL Technical Report
Figure 4 for Seed1.5-VL Technical Report
Viaarxiv icon

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Add code
Apr 14, 2025
Figure 1 for Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Figure 2 for Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Figure 3 for Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Figure 4 for Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Viaarxiv icon