Picture for Xiangyu Yue

Xiangyu Yue

In-Context Audio Control of Video Diffusion Transformers

Add code
Dec 21, 2025
Viaarxiv icon

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Add code
Dec 19, 2025
Figure 1 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 2 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 3 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 4 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Viaarxiv icon

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Add code
Dec 18, 2025
Figure 1 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 2 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 3 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Figure 4 for QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Viaarxiv icon

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Add code
Dec 18, 2025
Figure 1 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 2 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 3 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 4 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Viaarxiv icon

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Add code
Oct 10, 2025
Viaarxiv icon

Growing Visual Generative Capacity for Pre-Trained MLLMs

Add code
Oct 02, 2025
Viaarxiv icon

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Add code
Sep 18, 2025
Figure 1 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 2 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 3 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 4 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Viaarxiv icon

Transition Models: Rethinking the Generative Learning Objective

Add code
Sep 04, 2025
Figure 1 for Transition Models: Rethinking the Generative Learning Objective
Figure 2 for Transition Models: Rethinking the Generative Learning Objective
Figure 3 for Transition Models: Rethinking the Generative Learning Objective
Figure 4 for Transition Models: Rethinking the Generative Learning Objective
Viaarxiv icon

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Add code
Jul 30, 2025
Viaarxiv icon

MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models

Add code
Jun 24, 2025
Viaarxiv icon