Picture for Shafiq Joty

Shafiq Joty

What Makes a Good Natural Language Prompt?

Add code
Jun 07, 2025
Viaarxiv icon

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Add code
Jun 05, 2025
Viaarxiv icon

MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

Add code
May 26, 2025
Viaarxiv icon

Meta-Design Matters: A Self-Design Multi-Agent System

Add code
May 21, 2025
Viaarxiv icon

J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization

Add code
May 19, 2025
Viaarxiv icon

Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation

Add code
May 18, 2025
Viaarxiv icon

Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?

Add code
May 13, 2025
Viaarxiv icon

SweRank: Software Issue Localization with Code Ranking

Add code
May 07, 2025
Viaarxiv icon

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Add code
Apr 21, 2025
Viaarxiv icon

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Add code
Apr 12, 2025
Viaarxiv icon