Picture for Xin Peng

Xin Peng

Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution

Add code
Apr 07, 2026
Viaarxiv icon

AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild

Add code
Feb 12, 2026
Viaarxiv icon

Flow Perturbation++: Multi-Step Unbiased Jacobian Estimation for High-Dimensional Boltzmann Sampling

Add code
Jan 29, 2026
Viaarxiv icon

Rethinking Refinement: Correcting Generative Bias without Noise Injection

Add code
Jan 29, 2026
Viaarxiv icon

Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry

Add code
Jan 26, 2026
Viaarxiv icon

CI4A: Semantic Component Interfaces for Agents Empowering Web Automation

Add code
Jan 21, 2026
Viaarxiv icon

ExpertAD: Enhancing Autonomous Driving Systems with Mixture of Experts

Add code
Nov 13, 2025
Viaarxiv icon

Argus: Resilience-Oriented Safety Assurance Framework for End-to-End ADSs

Add code
Nov 12, 2025
Viaarxiv icon

Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs

Add code
May 26, 2025
Viaarxiv icon

Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Add code
Apr 17, 2025
Viaarxiv icon