Picture for Yiqing Xie

Yiqing Xie

Reinforcing Human Behavior Simulation via Verbal Feedback

Add code
May 19, 2026
Viaarxiv icon

Mind the Sim2Real Gap in User Simulation for Agentic Tasks

Add code
Mar 11, 2026
Viaarxiv icon

Hybrid-Gym: Training Coding Agents to Generalize Across Tasks

Add code
Feb 18, 2026
Viaarxiv icon

An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation

Add code
May 26, 2025
Figure 1 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 2 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 3 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 4 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Viaarxiv icon

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Add code
Mar 10, 2025
Figure 1 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 2 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 3 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 4 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Viaarxiv icon

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Add code
Dec 18, 2024
Viaarxiv icon

Improving Model Factuality with Fine-grained Critique-based Evaluator

Add code
Oct 24, 2024
Viaarxiv icon

CodeRAG-Bench: Can Retrieval Augment Code Generation?

Add code
Jun 20, 2024
Figure 1 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 2 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 3 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 4 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Viaarxiv icon

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Add code
Mar 31, 2024
Figure 1 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 2 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 3 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 4 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Viaarxiv icon

Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

Add code
Mar 01, 2024
Figure 1 for Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Figure 2 for Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Figure 3 for Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Figure 4 for Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Viaarxiv icon