Picture for Tao Yuan

Tao Yuan

and Other Contributors

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Add code
May 30, 2025
Viaarxiv icon

Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs

Add code
May 26, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Add code
Apr 17, 2025
Viaarxiv icon

Megrez-Omni Technical Report

Add code
Feb 19, 2025
Viaarxiv icon

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Add code
Dec 25, 2024
Figure 1 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 2 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 3 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Figure 4 for Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Viaarxiv icon

The Key of Understanding Vision Tasks: Explanatory Instructions

Add code
Dec 24, 2024
Figure 1 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 2 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 3 for The Key of Understanding Vision Tasks: Explanatory Instructions
Figure 4 for The Key of Understanding Vision Tasks: Explanatory Instructions
Viaarxiv icon

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Add code
Dec 20, 2024
Viaarxiv icon