Picture for Zilong Huang

Zilong Huang

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Add code
Aug 13, 2025
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Add code
Apr 14, 2025
Viaarxiv icon

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Add code
Apr 14, 2025
Figure 1 for The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Figure 2 for The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Figure 3 for The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Figure 4 for The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Viaarxiv icon

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Add code
Apr 11, 2025
Viaarxiv icon

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Add code
Apr 03, 2025
Viaarxiv icon

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Add code
Apr 01, 2025
Viaarxiv icon

4th PVUW MeViS 3rd Place Report: Sa2VA

Add code
Apr 01, 2025
Viaarxiv icon

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Add code
Jan 21, 2025
Viaarxiv icon