Picture for Kaiwen Zhou

Kaiwen Zhou

Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement

Add code
Aug 06, 2025
Viaarxiv icon

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Add code
Jul 17, 2025
Viaarxiv icon

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Viaarxiv icon

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Add code
May 22, 2025
Viaarxiv icon

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Add code
May 22, 2025
Viaarxiv icon

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Add code
May 21, 2025
Viaarxiv icon

VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT

Add code
Apr 06, 2025
Viaarxiv icon

Generative Models in Decision Making: A Survey

Add code
Feb 25, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Add code
Jan 27, 2025
Figure 1 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 2 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 3 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 4 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Viaarxiv icon