Picture for Xiaoyi Zhang

Xiaoyi Zhang

UI-Evol: Automatic Knowledge Evolving for Computer Use Agents

Add code
May 28, 2025
Viaarxiv icon

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

Add code
May 23, 2025
Viaarxiv icon

UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis

Add code
Apr 16, 2025
Viaarxiv icon

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Add code
May 13, 2024
Figure 1 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 2 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 3 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 4 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Viaarxiv icon

Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

Add code
Apr 03, 2024
Viaarxiv icon

GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning

Add code
Oct 24, 2023
Viaarxiv icon

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Add code
Oct 07, 2023
Viaarxiv icon

Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

Add code
Jun 02, 2023
Viaarxiv icon

Unifying Layout Generation with a Decoupled Diffusion Model

Add code
Mar 09, 2023
Figure 1 for Unifying Layout Generation with a Decoupled Diffusion Model
Figure 2 for Unifying Layout Generation with a Decoupled Diffusion Model
Figure 3 for Unifying Layout Generation with a Decoupled Diffusion Model
Figure 4 for Unifying Layout Generation with a Decoupled Diffusion Model
Viaarxiv icon

Understanding Mobile GUI: from Pixel-Words to Screen-Sentences

Add code
May 25, 2021
Figure 1 for Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Figure 2 for Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Figure 3 for Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Figure 4 for Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Viaarxiv icon