Picture for Bocheng Zou

Bocheng Zou

Agent Skills Should Go Beyond Text: The Case for Visual Skills

Add code
May 31, 2026
Viaarxiv icon

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Add code
May 18, 2026
Viaarxiv icon

Chrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Framework

Add code
May 14, 2026
Viaarxiv icon

Coding Agent Is Good As World Simulator

Add code
May 14, 2026
Viaarxiv icon

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Add code
Mar 26, 2026
Viaarxiv icon

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Add code
May 20, 2025
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Add code
Jul 15, 2024
Figure 1 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 2 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 3 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 4 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Viaarxiv icon

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

Add code
May 03, 2024
Figure 1 for LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Figure 2 for LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Figure 3 for LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Figure 4 for LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Viaarxiv icon

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Add code
Apr 01, 2024
Figure 1 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 2 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 3 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 4 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Viaarxiv icon