Picture for Cuiyun Gao

Cuiyun Gao

AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration

Add code
Dec 23, 2025
Viaarxiv icon

Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice

Add code
Nov 10, 2025
Figure 1 for Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice
Figure 2 for Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice
Figure 3 for Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice
Figure 4 for Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice
Viaarxiv icon

Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data

Add code
Jun 09, 2025
Figure 1 for Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Figure 2 for Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Figure 3 for Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Figure 4 for Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Viaarxiv icon

Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities

Add code
May 28, 2025
Figure 1 for Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities
Figure 2 for Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities
Figure 3 for Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities
Figure 4 for Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities
Viaarxiv icon

LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

Add code
May 21, 2025
Viaarxiv icon

CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation

Add code
Apr 18, 2025
Viaarxiv icon

An LLM-based Agent for Reliable Docker Environment Configuration

Add code
Feb 19, 2025
Viaarxiv icon

Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering

Add code
Feb 10, 2025
Viaarxiv icon

The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation

Add code
Jan 02, 2025
Figure 1 for The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation
Figure 2 for The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation
Figure 3 for The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation
Figure 4 for The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation
Viaarxiv icon

CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering

Add code
Dec 19, 2024
Figure 1 for CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
Figure 2 for CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
Figure 3 for CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
Figure 4 for CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
Viaarxiv icon