Picture for Wenjun Li

Wenjun Li

VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments

Add code
Dec 19, 2025
Viaarxiv icon

Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research

Add code
Oct 24, 2025
Viaarxiv icon

Understanding R1-Zero-Like Training: A Critical Perspective

Add code
Mar 26, 2025
Figure 1 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 2 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 3 for Understanding R1-Zero-Like Training: A Critical Perspective
Figure 4 for Understanding R1-Zero-Like Training: A Critical Perspective
Viaarxiv icon

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Add code
Mar 12, 2025
Figure 1 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 2 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 3 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Figure 4 for Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Viaarxiv icon

Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger

Add code
Feb 18, 2025
Figure 1 for Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Figure 2 for Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Figure 3 for Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Figure 4 for Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger
Viaarxiv icon

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Add code
Feb 08, 2025
Viaarxiv icon

A Survey of Foundation Models for Music Understanding

Add code
Sep 15, 2024
Figure 1 for A Survey of Foundation Models for Music Understanding
Figure 2 for A Survey of Foundation Models for Music Understanding
Figure 3 for A Survey of Foundation Models for Music Understanding
Figure 4 for A Survey of Foundation Models for Music Understanding
Viaarxiv icon

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

Add code
Aug 02, 2024
Figure 1 for A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Viaarxiv icon

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

Add code
Jun 15, 2024
Figure 1 for Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Figure 2 for Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Figure 3 for Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Figure 4 for Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Viaarxiv icon

Controllable Talking Face Generation by Implicit Facial Keypoints Editing

Add code
Jun 05, 2024
Viaarxiv icon