Picture for Carolyn Rose

Carolyn Rose

MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

Add code
Jul 15, 2025
Figure 1 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 2 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 3 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 4 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Viaarxiv icon

PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics

Add code
May 29, 2025
Figure 1 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 2 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 3 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 4 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Viaarxiv icon

An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation

Add code
May 26, 2025
Figure 1 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 2 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 3 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 4 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Viaarxiv icon

Where is this coming from? Making groundedness count in the evaluation of Document VQA models

Add code
Mar 24, 2025
Viaarxiv icon

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Add code
Mar 10, 2025
Figure 1 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 2 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 3 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Figure 4 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Viaarxiv icon

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Add code
Jan 27, 2025
Figure 1 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 2 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 3 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 4 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Viaarxiv icon

Improving Model Factuality with Fine-grained Critique-based Evaluator

Add code
Oct 24, 2024
Viaarxiv icon

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells

Add code
Sep 29, 2024
Figure 1 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 2 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 3 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 4 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Viaarxiv icon

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Add code
Mar 31, 2024
Figure 1 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 2 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 3 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Figure 4 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Viaarxiv icon

Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Add code
Nov 01, 2023
Figure 1 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Figure 2 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Figure 3 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Figure 4 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Viaarxiv icon