Picture for Yixiao Ge

Yixiao Ge

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Add code
Jul 28, 2025
Viaarxiv icon

LoRA-Gen: Specializing Large Language Model via Online LoRA Generation

Add code
Jun 13, 2025
Viaarxiv icon

Aligning Latent Spaces with Flow Priors

Add code
Jun 05, 2025
Viaarxiv icon

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Add code
May 27, 2025
Viaarxiv icon

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Add code
May 08, 2025
Viaarxiv icon

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Add code
Apr 01, 2025
Viaarxiv icon

Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1

Add code
Mar 31, 2025
Viaarxiv icon

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Add code
Mar 25, 2025
Viaarxiv icon

Equivariant Filter Design for Range-only SLAM

Add code
Mar 05, 2025
Figure 1 for Equivariant Filter Design for Range-only SLAM
Figure 2 for Equivariant Filter Design for Range-only SLAM
Figure 3 for Equivariant Filter Design for Range-only SLAM
Figure 4 for Equivariant Filter Design for Range-only SLAM
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Figure 1 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 2 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 3 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 4 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Viaarxiv icon