Picture for Yunxin Li

Yunxin Li

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Add code
Jun 12, 2025
Viaarxiv icon

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Add code
May 25, 2025
Viaarxiv icon

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Add code
May 08, 2025
Viaarxiv icon

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

Add code
Apr 23, 2025
Viaarxiv icon

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

Add code
Feb 27, 2025
Figure 1 for Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Figure 2 for Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Figure 3 for Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Figure 4 for Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Viaarxiv icon

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Add code
Jan 21, 2025
Viaarxiv icon

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Add code
Aug 19, 2024
Figure 1 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 2 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 3 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 4 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Viaarxiv icon

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Add code
Jun 17, 2024
Figure 1 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 2 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 3 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 4 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Viaarxiv icon

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Add code
May 18, 2024
Figure 1 for Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Figure 2 for Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Figure 3 for Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Figure 4 for Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Viaarxiv icon

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Add code
May 08, 2024
Viaarxiv icon