Picture for Daoguang Zan

Daoguang Zan

AInsteinBench: Benchmarking Coding Agents on Scientific Repositories

Add code
Dec 24, 2025
Viaarxiv icon

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Add code
Dec 14, 2025
Viaarxiv icon

Virtual Width Networks

Add code
Nov 17, 2025
Figure 1 for Virtual Width Networks
Figure 2 for Virtual Width Networks
Figure 3 for Virtual Width Networks
Figure 4 for Virtual Width Networks
Viaarxiv icon

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Add code
Apr 03, 2025
Viaarxiv icon

CodeV: Issue Resolving with Visual Data

Add code
Dec 23, 2024
Figure 1 for CodeV: Issue Resolving with Visual Data
Figure 2 for CodeV: Issue Resolving with Visual Data
Figure 3 for CodeV: Issue Resolving with Visual Data
Figure 4 for CodeV: Issue Resolving with Visual Data
Viaarxiv icon

Aligning CodeLLMs with Direct Preference Optimization

Add code
Oct 24, 2024
Viaarxiv icon

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Add code
Oct 10, 2024
Figure 1 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 2 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 3 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 4 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Viaarxiv icon

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Add code
Sep 04, 2024
Figure 1 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 2 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 3 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 4 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Viaarxiv icon

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

Add code
Aug 26, 2024
Figure 1 for SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Figure 2 for SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Figure 3 for SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Figure 4 for SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
Viaarxiv icon

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

Add code
Jun 14, 2024
Figure 1 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 2 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 3 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 4 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Viaarxiv icon