Picture for Wenhao Chai

Wenhao Chai

Next-Embedding Prediction Makes Strong Vision Learners

Add code
Dec 23, 2025
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning

Add code
Oct 21, 2025
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Figure 1 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 2 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 3 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 4 for VideoNSA: Native Sparse Attention Scales Video Understanding
Viaarxiv icon

Dense Video Understanding with Gated Residual Tokenization

Add code
Sep 18, 2025
Viaarxiv icon

AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding

Add code
Jul 03, 2025
Viaarxiv icon

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Add code
Jun 13, 2025
Viaarxiv icon

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

Add code
May 29, 2025
Viaarxiv icon

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Add code
May 29, 2025
Viaarxiv icon