Picture for Atharva Naik

Atharva Naik

ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models

Add code
Feb 17, 2026
Viaarxiv icon

PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics

Add code
Nov 17, 2025
Figure 1 for PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics
Figure 2 for PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics
Figure 3 for PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics
Figure 4 for PragWorld: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics
Viaarxiv icon

MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization

Add code
Jul 15, 2025
Figure 1 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 2 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 3 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Figure 4 for MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization
Viaarxiv icon

PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics

Add code
May 29, 2025
Figure 1 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 2 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 3 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Figure 4 for PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Viaarxiv icon

An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation

Add code
May 26, 2025
Figure 1 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 2 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 3 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Figure 4 for An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Viaarxiv icon

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction

Add code
Jan 27, 2025
Figure 1 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 2 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 3 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Figure 4 for Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Viaarxiv icon

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells

Add code
Sep 29, 2024
Figure 1 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 2 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 3 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Figure 4 for CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
Viaarxiv icon

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Add code
Jun 18, 2024
Figure 1 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Figure 2 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Figure 3 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Figure 4 for Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Viaarxiv icon

Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning

Add code
Apr 28, 2024
Figure 1 for Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Figure 2 for Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Figure 3 for Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Figure 4 for Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Viaarxiv icon

On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation

Add code
Apr 26, 2024
Viaarxiv icon