Picture for Kaichen Zhang

Kaichen Zhang

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Add code
May 25, 2026
Viaarxiv icon

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Add code
May 21, 2026
Viaarxiv icon

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Add code
May 13, 2026
Viaarxiv icon

Streaming Multi-agent Pathfinding

Add code
May 14, 2025
Viaarxiv icon

GVPO: Group Variance Policy Optimization for Large Language Model Post-Training

Add code
Apr 28, 2025
Figure 1 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Figure 2 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Figure 3 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Viaarxiv icon

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Add code
Nov 22, 2024
Figure 1 for Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Figure 2 for Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Figure 3 for Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Figure 4 for Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Viaarxiv icon

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Add code
Oct 17, 2024
Figure 1 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 2 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 3 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Figure 4 for MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Figure 1 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 2 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 3 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Figure 4 for LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Figure 1 for Long Context Transfer from Language to Vision
Figure 2 for Long Context Transfer from Language to Vision
Figure 3 for Long Context Transfer from Language to Vision
Figure 4 for Long Context Transfer from Language to Vision
Viaarxiv icon