Picture for Victor Zhong

Victor Zhong

ASH: Agents that Self-Hone via Embodied Learning

Add code
May 14, 2026
Viaarxiv icon

AgentIR: Reasoning-Aware Retrieval for Deep Research Agents

Add code
Mar 05, 2026
Viaarxiv icon

ModelTables: A Corpus of Tables about Models

Add code
Dec 18, 2025
Viaarxiv icon

SynQuE: Estimating Synthetic Dataset Quality Without Annotations

Add code
Nov 06, 2025
Figure 1 for SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Figure 2 for SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Figure 3 for SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Figure 4 for SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Viaarxiv icon

Grounded Test-Time Adaptation for LLM Agents

Add code
Nov 06, 2025
Figure 1 for Grounded Test-Time Adaptation for LLM Agents
Figure 2 for Grounded Test-Time Adaptation for LLM Agents
Figure 3 for Grounded Test-Time Adaptation for LLM Agents
Figure 4 for Grounded Test-Time Adaptation for LLM Agents
Viaarxiv icon

OpenCUA: Open Foundations for Computer-Use Agents

Add code
Aug 12, 2025
Viaarxiv icon

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Add code
Jul 15, 2024
Figure 1 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 2 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 3 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 4 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Viaarxiv icon

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Add code
Apr 11, 2024
Figure 1 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 2 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 3 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 4 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Viaarxiv icon

Policy Improvement using Language Feedback Models

Add code
Feb 25, 2024
Figure 1 for Policy Improvement using Language Feedback Models
Figure 2 for Policy Improvement using Language Feedback Models
Figure 3 for Policy Improvement using Language Feedback Models
Figure 4 for Policy Improvement using Language Feedback Models
Viaarxiv icon