Picture for Shunyu Yao

Shunyu Yao

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Add code
Jun 17, 2024
Viaarxiv icon

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Add code
Jun 07, 2024
Viaarxiv icon

DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems

Add code
May 27, 2024
Figure 1 for DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
Figure 2 for DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
Figure 3 for DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
Figure 4 for DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
Viaarxiv icon

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Add code
Apr 17, 2024
Figure 1 for NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Figure 2 for NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Figure 3 for NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Figure 4 for NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Viaarxiv icon

Can Language Models Solve Olympiad Programming?

Add code
Apr 16, 2024
Figure 1 for Can Language Models Solve Olympiad Programming?
Figure 2 for Can Language Models Solve Olympiad Programming?
Figure 3 for Can Language Models Solve Olympiad Programming?
Figure 4 for Can Language Models Solve Olympiad Programming?
Viaarxiv icon

DevBench: A Comprehensive Benchmark for Software Development

Add code
Mar 15, 2024
Figure 1 for DevBench: A Comprehensive Benchmark for Software Development
Figure 2 for DevBench: A Comprehensive Benchmark for Software Development
Figure 3 for DevBench: A Comprehensive Benchmark for Software Development
Figure 4 for DevBench: A Comprehensive Benchmark for Software Development
Viaarxiv icon

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Add code
Feb 15, 2024
Viaarxiv icon

Large Language Model for Multi-objective Evolutionary Optimization

Add code
Oct 25, 2023
Figure 1 for Large Language Model for Multi-objective Evolutionary Optimization
Figure 2 for Large Language Model for Multi-objective Evolutionary Optimization
Figure 3 for Large Language Model for Multi-objective Evolutionary Optimization
Figure 4 for Large Language Model for Multi-objective Evolutionary Optimization
Viaarxiv icon

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Add code
Oct 10, 2023
Figure 1 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 2 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 3 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Figure 4 for SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Viaarxiv icon

FireAct: Toward Language Agent Fine-tuning

Add code
Oct 09, 2023
Figure 1 for FireAct: Toward Language Agent Fine-tuning
Figure 2 for FireAct: Toward Language Agent Fine-tuning
Figure 3 for FireAct: Toward Language Agent Fine-tuning
Figure 4 for FireAct: Toward Language Agent Fine-tuning
Viaarxiv icon