Picture for Zhaoyang Liu

Zhaoyang Liu

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Viaarxiv icon

Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models

Add code
Aug 10, 2025
Viaarxiv icon

Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning

Add code
May 29, 2025
Viaarxiv icon

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Add code
May 29, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon

EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks

Add code
Mar 14, 2025
Viaarxiv icon

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Add code
Mar 13, 2025
Viaarxiv icon

ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement

Add code
Dec 25, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon