Picture for Bowen Yang

Bowen Yang

Charles

A Training-Free Guess What Vision Language Model from Snippets to Open-Vocabulary Object Detection

Add code
Jan 21, 2026
Viaarxiv icon

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Add code
Jan 12, 2026
Viaarxiv icon

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Add code
Dec 18, 2025
Figure 1 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 2 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 3 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 4 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Viaarxiv icon

Translating Informal Proofs into Formal Proofs Using a Chain of States

Add code
Dec 12, 2025
Figure 1 for Translating Informal Proofs into Formal Proofs Using a Chain of States
Figure 2 for Translating Informal Proofs into Formal Proofs Using a Chain of States
Figure 3 for Translating Informal Proofs into Formal Proofs Using a Chain of States
Figure 4 for Translating Informal Proofs into Formal Proofs Using a Chain of States
Viaarxiv icon

LiePrune: Lie Group and Quantum Geometric Dual Representation for One-Shot Structured Pruning of Quantum Neural Networks

Add code
Dec 10, 2025
Viaarxiv icon

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Add code
Sep 18, 2025
Figure 1 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 2 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 3 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 4 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Viaarxiv icon

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

Add code
Sep 11, 2025
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Add code
Jul 30, 2025
Figure 1 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 2 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 3 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Figure 4 for MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
Viaarxiv icon

Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics

Add code
Apr 16, 2025
Viaarxiv icon